visual_behavior_analysis failure saving multi session trials df to hdf5

when i run visual_behavior.ophys.io.create_multi_session_mean_df. get_multi_session_mean_df(experiment_ids, cache_dir, conditions=['cell_specimen_id', 'change_image_name', 'trial_type'])

I get this error. It doesn't happen for any other input conditions. I have assumed that it is because this dataframe ends up being much larger than others, but it could also be an issue with data types.

C:\Anaconda\lib\site-packages\pandas\core\generic.py:1471: PerformanceWarning: your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed-integer,key->block1_values] [items->['change_image_name', 'trial_type', 'mean_trace', 'sem_trace', 'mean_responses', 'experiment_container_id', 'targeted_structure', 'specimen_driver_line', 'cre_line', 'reporter_line', 'full_genotype', 'session_type', 'stage', 'experiment_date', 'project_id', 'rig', 'image_set']]

return pytables.to_hdf(path_or_buf, key, self, **kwargs) Traceback (most recent call last): File "C:/Users/marinag/Documents/Code/visual_behavior_analysis/visual_behavior/ophys/io/create_multi_session_mean_df.py", line 127, in conditions=['cell_specimen_id', 'change_image_name', 'trial_type']) File "C:/Users/marinag/Documents/Code/visual_behavior_analysis/visual_behavior/ophys/io/create_multi_session_mean_df.py", line 92, in get_multi_session_mean_df format='fixed') File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 1471, in to_hdf return pytables.to_hdf(path_or_buf, key, self, **kwargs) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 281, in to_hdf f(store) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 275, in f = lambda store: store.put(key, value, **kwargs) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 866, in put self._write_to_group(key, value, append=append, **kwargs) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 1341, in _write_to_group s.write(obj=value, append=append, complib=complib, **kwargs) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 2930, in write self.write_array('block%d_values' % i, blk.values, items=blk_items) File "C:\Anaconda\lib\site-packages\pandas\io\pytables.py", line 2698, in write_array vlarr.append(value) File "C:\Anaconda\lib\site-packages\tables\vlarray.py", line 537, in append self._append(nparr, nobjects) File "tables\hdf5extension.pyx", line 1929, in tables.hdf5extension.VLArray._append (tables\hdf5extension.c:20794) OverflowError: Python int too large to convert to C long

Apr 19 '19 05:04 matchings

@matchings did this error show up on the cluster as well? If it only happens on windows it may be related to this: https://stackoverflow.com/questions/38314118/overflowerror-python-int-too-large-to-convert-to-c-long-on-windows-but-not-ma

Apr 23 '19 17:04 nickponvert

@nickponvert error shows up on cluster also. here is a cluster job record that failed: "\allen\programs\braintv\workgroups\nc-ophys\Marina\ClusterJobs\JobRecords2\12686038.qmaster2.corp.alleninstitute.org.err"

however I did finally get the thing to save, by removing some of the conditions that went in to the dataframe and effectively making it smaller. This doesn't solve the actual problem of course, but it is a temporary solution.

i am a bit confused because i was running this code to check data types and they are mostly strings or floats:

but when i look for types a different way, most are objects:

hopefully there is a hint in there somewhere.

You can check out the dataframe that I got to save here: "\allen\programs\braintv\workgroups\nc-ophys\visual_behavior\visual_behavior_production_analysis\multi_session_summary_dfs\mean_trials_change_image_name_trial_type_df.h5"

And the code to generate it is in visual_behavior.ophys.io.create_multi_session_mean_df.

you can see the function call in the main at the bottom of that script. it is specifically giving problems when i try to run it with trial_type as a condition, presumably because it results in a larger dataframe, with this function call: get_multi_session_mean_df(experiment_ids, cache_dir, conditions=['cell_specimen_id', 'change_image_name', 'trial_type'])

Apr 24 '19 02:04 matchings

@matchings is this issue still a problem?

Jan 23 '20 03:01 nickponvert