VisualBehaviorOphysProjectCache.from_lims() method does not accurately reflect what is in lims
Describe the bug
When I run the following code, the table that is returned does not include the complete list of ophys_experiment_ids for each ophys_container_id. It appears to require that a given ophys_experiment_id is only represented once, however it is possible for an ophys_experiment_id to be associated with multiple ophys_container_ids in lims (some that have passed and some that were failed). The result of this is that sometimes an ophys_experiment_id is associated with a passed container, and other times with a failed container, meaning that the passing ophys_container_id does not have the correct list of ophys_experiment_ids associated with it.
To Reproduce
from allensdk.brain_observatory.behavior.behavior_project_cache import VisualBehaviorOphysProjectCache as bpc
cache = bpc.from_lims()
experiments = cache.get_ophys_experiment_table()
print('there are', len(experiments[experiments.ophys_container_id==1115959875]),
'experiments associated with container id', 1115959875, 'in the cache table')
Expected behavior There are actually 6 experiments associated with that container ID in lims. This can be observed through direct lims queries, and by looking at the lims directory for that container:

Workaround
@djkapner identified that this part of the code in the SDK enforces that there should only be one unique ophys_experiment_id in the ophys_experiments table, and suggested the code block below as an alternative workaround. This block gives me the correct list of ophys_experiment_ids in lims for a given ophys_container_id, including cases where an ophys_experiment_id is associated with multiple containers. This is the ground truth that we need access to do QC and other tasks, like identifying candidate experiments for release.
experiments = cache.fetch_api.get_ophys_experiment_table()
print('there are', len(experiments[experiments.ophys_container_id==1115959875]),
'experiments associated with container id', 1115959875, 'in the cache table')
print('yay this is correct')
Environment (please complete the following information):
- AllenSDK version 2.11.0
deeper link for that explanation: https://github.com/AllenInstitute/AllenSDK/blob/0178688ccdfb4d3b6c311cc8879126f7d64e90a1/allensdk/brain_observatory/behavior/behavior_project_cache/tables/project_table.py#L32-L35
@matchings I think this branch
https://github.com/AllenInstitute/AllenSDK/tree/ticket/2187/dev
will fix the problem for you. Note the added passed_only kwarg in get_ophys_experiment_table that must be set to False to get everything. If you have any tests you want to run, please do so.
I used this branch to create an experiments table from_lims. It matched your ground truth table with the exception of 35 experiments that do not appear to be in the S3 bucket now, anyway. We can dig into that if you want
The 35 ophys_experiment_ids that were not returned by a naive passed_only=True query from LIMS were
could not find 1012165655
could not find 1008408505
could not find 1011771129
could not find 1011771134
could not find 994082657
could not find 993891317
could not find 994278584
could not find 994278590
could not find 1101388665
could not find 1101564157
could not find 1012771669
could not find 1012771667
could not find 1012771670
could not find 1012771672
could not find 1012771673
could not find 1012771678
could not find 1010092818
could not find 1081870750
could not find 1081870759
could not find 1082434510
could not find 965267337
could not find 1078699503
could not find 1078699509
could not find 1058613919
could not find 1058613914
could not find 1058613924
could not find 1058835249
could not find 1059574070
could not find 1059792822
could not find 1059792832
could not find 1059792820
could not find 1060223938
could not find 1105664341
could not find 1106021634
could not find 1106021635