spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Filtering metadata not saved properly

Open naterenegar opened this issue 1 year ago • 2 comments

Hello,

When saving a preprocessed binary recording to a folder, there are conflicting annotations in two different json files about the status of the binary filtering.

Here's the snippet I'm using to save:

import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre

global_job_kwargs = dict(n_jobs=4, chunk_duration="1s")
si.set_global_job_kwargs(**global_job_kwargs)

recording = se.read_maxwell("myrecording.raw.h5")
recording_f = spre.bandpass_filter(recording, freq_min=300, freq_max=6000)
recording_cmr = spre.common_reference(recording_f, reference="global", operator="median")
recording = recording_cmr.save(folder='testsave', format='binary')
recording = se.BinaryFolderRecording('testsave')

Later I load back this recording and a sorting result into a waveform extractor with the following code:

sorting_SC2 = ss.read_sorter_folder('spykingcircus2_output',register_recording=False)

from spikeinterface.exporters import export_to_phy, export_report

print("Exporting sorting results to phy for manual comparison")
weSC2 = si.extract_waveforms(recording=recording,
                          sorting=sorting_SC2,
                          sparse=True,
                          folder='sc2_wvfms',
                          overwrite=True,
                          max_spikes_per_unit=500)

This code emits the following warning:

Traceback (most recent call last):
  File "/home/nrenegar/analysis/export_results.py", line 26, in <module>
    weSC2 = si.extract_waveforms(recording=recording,
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1655, in extract_waveforms
    sparsity = precompute_sparsity(
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1783, in precompute_sparsity
    local_we = extract_waveforms(
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1670, in extract_waveforms
    we = WaveformExtractor.create(
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 320, in create
    return cls(
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 86, in __init__
    self.set_recording(recording, rec_attributes, allow_unfiltered)
  File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 702, in set_recording
    raise Exception(
Exception: The recording is not filtered, you must filter it using `bandpass_filter()`.If the recording is already filtered, you can also do `recording.annotate(is_filtered=True).
If you trully want to extract unfiltered waveforms, use `allow_unfiltered=True`.

In the recording folder, I looked at two different JSON files, binary.json and si_folder.json. In si_folder.json, I found the following annotation is_filtered=True, but for binary.json, the annotation is False.

When I look at the re-loaded data using BinaryFolderRecording, it appears to be filtered.

Thanks!

naterenegar avatar Mar 22 '24 17:03 naterenegar

I can fix this with the suggestion in the warning, but just wanted to point out that there seems to be an issue with the metadata saving

naterenegar avatar Mar 22 '24 17:03 naterenegar

Hi,

When using the save(folder="...") function, then you should use the si.load_extractor("path-to-folder") to reload the object. The se.BinaryFolderRecording only loads part of the info, hence the issue.

Can you try? See here

alejoe91 avatar Mar 24 '24 09:03 alejoe91

The fact that this confusion occurred, do you think we should add something to the save docstring? If you asked me to guess how to load something that I just saved as binary it would make sense to then reload it with se.BinaryFolderRecording. Also switching to the core module instead of the extractors might be confusing too.

So maybe we:

  1. make sure the save docstring is clear (we could add a line here to make it clear how to load fromm save)
  2. allow import of load_extractor in extractors so that extraction is always possible with extractors module (to keep the mental model consistent)

zm711 avatar Mar 24 '24 15:03 zm711

Yeah I think we should make this clearer in the docs! :) I wouldn't add it to the extractors module though because this is supposed to be a a core mechanism.

alejoe91 avatar Mar 24 '24 15:03 alejoe91

I wouldn't add it to the extractors module though because this is supposed to be a a core mechanism.

not even importing in the init so that a user could do se.load_extractor or si.load_extractor? We did the same for some of the other core extractors. Just curious what the hard distinction is for this?

zm711 avatar Mar 24 '24 15:03 zm711

Hi,

When using the save(folder="...") function, then you should use the si.load_extractor("path-to-folder") to reload the object. The se.BinaryFolderRecording only loads part of the info, hence the issue.

Can you try? See here

This did the trick. It is a little counterintuitive that reloading a binary you saved to a folder with BinaryFolderRecording doesn't load all of the info. I went for BinaryFolderRecording because it's the object that's returned after you save. See the getting started document .

Naturally I went to the docs for BinaryFolderRecording, saw you could initialize it with a folder, and tried it out.

Maybe adding a blurb to the getting started page on how to reload saved binaries would be good. No hits on Ctrl+F for "load_extractor" on that page

naterenegar avatar Mar 24 '24 16:03 naterenegar

I agree that we should wrk on the semantic. I think load_extractor() is fuzzy. The true meaning is : "load any object saved or dump by spikeinterface into file or dict".

samuelgarcia avatar Mar 25 '24 07:03 samuelgarcia

What about something like load_saved_spikeinterface_object. I think it is a huge function which makes naming it specific enough, but also general enough very difficult. And it's not very easy to discover. I still think that even if we don't change the name at minimum we add to the save docstring that things saved can be reloaded with si.load_extractor. I can put this on my internal todo list (I'm just a bit busy this week doing data analysis myself).

zm711 avatar Mar 25 '24 12:03 zm711

My dream would go to the oposite direction spikeinterface.load()

samuelgarcia avatar Mar 25 '24 13:03 samuelgarcia

Actually that's not too bad as long as it can truly load anything.

zm711 avatar Mar 25 '24 13:03 zm711

@samuelgarcia did you want to leave this open to think about the semantics of load? The load_extractor was added to the docs so the immediate issue is done.

zm711 avatar Apr 12 '24 13:04 zm711

I will close this for now. I think we can think about load at a future date and the immediate issue is fixed.

zm711 avatar Jun 19 '24 16:06 zm711