panoptes-python-client icon indicating copy to clipboard operation
panoptes-python-client copied to clipboard

last_id is silently not supported for Subject.where()

Open mwalmsley opened this issue 7 years ago • 2 comments

Adding last_id={id} to Subject.where() appears to have no effect and no error.

Test Case

Executing:

`subjects = Subject.where( scope='project', project_id='5733' )

for n in range(10): s = subjects.next()`

Gives the following result:

<Subject 30091684> <Subject 30091682> <Subject 30091673> <Subject 30091670> <Subject 30091664> <Subject 30091662> <Subject 30091656> <Subject 30091654> <Subject 30091645> <Subject 30091641>

Adding last_id=30091682 gives the same result as above.

mwalmsley avatar Apr 04 '19 12:04 mwalmsley

This is due to the subject API resource lacking the optimized last_id support. That was added to speed up the classifications API but it should be ported to the each resource.

Paging through the resource result sets via next / previous links is the standard support for resources and subject does work this way. Does that meet your use case here?

camallen avatar Apr 04 '19 13:04 camallen

I think that I didn't give enough thought to what I actually needed to accomplish here.

I realised that in order for iteratively downloaded (yay for last_id) classifications to be useful, I need the metadata from the subject to link those classifications back to the science catalog.

classification <-(links.subject, subject_id)-> subject (metadata.science_id, science_id) <- science_catalog

My first thought was to download all new subjects with last_id - but of course, that's not how subjects work! Old subjects can get new classifications.

Paging would work to download all subjects, but doing that daily would be slow and duplicate calls.

My current solution is to get the specific subject for each new classification:

subject_id = classification['links']['subjects'][0] # only works for single-subject projects

subject = get_subject(project_id, subject_id) # assume id is unique

classification['links']['subject'] = subject.raw

save_classification_to_file(classification, save_loc)

and decorate get_subject (which is simply the Python client) with a huge lru_cache, on the assumption that subjects tend to appear repeatedly at similar times (i.e. the currently active subject set).

This saves me having to maintain an up-to-date duplicate database of all subjects, but is a bit slow vs. the optimised classification interface.

I would guess that wanting to get the subject details along with the classification details would be quite useful for others, though I'm not sure how best to implement this.

mwalmsley avatar Apr 09 '19 14:04 mwalmsley