Filtering metadata not saved properly
Hello,
When saving a preprocessed binary recording to a folder, there are conflicting annotations in two different json files about the status of the binary filtering.
Here's the snippet I'm using to save:
import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre
global_job_kwargs = dict(n_jobs=4, chunk_duration="1s")
si.set_global_job_kwargs(**global_job_kwargs)
recording = se.read_maxwell("myrecording.raw.h5")
recording_f = spre.bandpass_filter(recording, freq_min=300, freq_max=6000)
recording_cmr = spre.common_reference(recording_f, reference="global", operator="median")
recording = recording_cmr.save(folder='testsave', format='binary')
recording = se.BinaryFolderRecording('testsave')
Later I load back this recording and a sorting result into a waveform extractor with the following code:
sorting_SC2 = ss.read_sorter_folder('spykingcircus2_output',register_recording=False)
from spikeinterface.exporters import export_to_phy, export_report
print("Exporting sorting results to phy for manual comparison")
weSC2 = si.extract_waveforms(recording=recording,
sorting=sorting_SC2,
sparse=True,
folder='sc2_wvfms',
overwrite=True,
max_spikes_per_unit=500)
This code emits the following warning:
Traceback (most recent call last):
File "/home/nrenegar/analysis/export_results.py", line 26, in <module>
weSC2 = si.extract_waveforms(recording=recording,
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1655, in extract_waveforms
sparsity = precompute_sparsity(
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1783, in precompute_sparsity
local_we = extract_waveforms(
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 1670, in extract_waveforms
we = WaveformExtractor.create(
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 320, in create
return cls(
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 86, in __init__
self.set_recording(recording, rec_attributes, allow_unfiltered)
File "/home/nrenegar/miniconda3/envs/spikeint/lib/python3.10/site-packages/spikeinterface/core/waveform_extractor.py", line 702, in set_recording
raise Exception(
Exception: The recording is not filtered, you must filter it using `bandpass_filter()`.If the recording is already filtered, you can also do `recording.annotate(is_filtered=True).
If you trully want to extract unfiltered waveforms, use `allow_unfiltered=True`.
In the recording folder, I looked at two different JSON files, binary.json and si_folder.json. In si_folder.json, I found the following annotation is_filtered=True, but for binary.json, the annotation is False.
When I look at the re-loaded data using BinaryFolderRecording, it appears to be filtered.
Thanks!
I can fix this with the suggestion in the warning, but just wanted to point out that there seems to be an issue with the metadata saving
Hi,
When using the save(folder="...") function, then you should use the si.load_extractor("path-to-folder") to reload the object. The se.BinaryFolderRecording only loads part of the info, hence the issue.
Can you try? See here
The fact that this confusion occurred, do you think we should add something to the save docstring? If you asked me to guess how to load something that I just saved as binary it would make sense to then reload it with se.BinaryFolderRecording. Also switching to the core module instead of the extractors might be confusing too.
So maybe we:
- make sure the save docstring is clear (we could add a line here to make it clear how to load fromm save)
- allow import of load_extractor in extractors so that extraction is always possible with extractors module (to keep the mental model consistent)
Yeah I think we should make this clearer in the docs! :) I wouldn't add it to the extractors module though because this is supposed to be a a core mechanism.
I wouldn't add it to the extractors module though because this is supposed to be a a core mechanism.
not even importing in the init so that a user could do se.load_extractor or si.load_extractor? We did the same for some of the other core extractors. Just curious what the hard distinction is for this?
Hi,
When using the
save(folder="...")function, then you should use thesi.load_extractor("path-to-folder")to reload the object. These.BinaryFolderRecordingonly loads part of the info, hence the issue.Can you try? See here
This did the trick. It is a little counterintuitive that reloading a binary you saved to a folder with BinaryFolderRecording doesn't load all of the info. I went for BinaryFolderRecording because it's the object that's returned after you save. See the getting started document .
Naturally I went to the docs for BinaryFolderRecording, saw you could initialize it with a folder, and tried it out.
Maybe adding a blurb to the getting started page on how to reload saved binaries would be good. No hits on Ctrl+F for "load_extractor" on that page
I agree that we should wrk on the semantic. I think load_extractor() is fuzzy.
The true meaning is : "load any object saved or dump by spikeinterface into file or dict".
What about something like load_saved_spikeinterface_object. I think it is a huge function which makes naming it specific enough, but also general enough very difficult. And it's not very easy to discover. I still think that even if we don't change the name at minimum we add to the save docstring that things saved can be reloaded with si.load_extractor. I can put this on my internal todo list (I'm just a bit busy this week doing data analysis myself).
My dream would go to the oposite direction spikeinterface.load()
Actually that's not too bad as long as it can truly load anything.
@samuelgarcia did you want to leave this open to think about the semantics of load? The load_extractor was added to the docs so the immediate issue is done.
I will close this for now. I think we can think about load at a future date and the immediate issue is fixed.