spikeinterface icon indicating copy to clipboard operation
spikeinterface copied to clipboard

Feature Request: init recording from sorting (phy, kilosort)

Open pas-calc opened this issue 9 months ago • 4 comments

For a "sorting" based on read_phy / read_kilosort (class BasePhyKilosortSortingExtractor), the params.py file is read here

https://github.com/SpikeInterface/spikeinterface/blob/85f4ff3b4c35532aa5df3238e15f901d439fadc5/src/spikeinterface/extractors/phykilosortextractors.py#L60

params = read_python(str(phy_folder / "params.py"))

For the sorting, only the sampling rate is used.

It would be helpful to have this result (params dictionary) available (like self.params = params), especially for the associated recording read using read_binary (class BinaryRecordingExtractor).

Then it could be as simple as

file_path = params["dat_path"]
sampling_frequency = params["sample_rate"]
dtype = params["dtype"]
num_channels = params["n_channels_dat"]

recording = read_binary(file_path, sampling_frequency, dtype, num_channels)

We could also have a function like read_binary_from_sorting which accesses sorting.params.

And this is just one more line.

sorting_analyzer = si.create_sorting_analyzer(sorting=sorting, recording=recording)

pas-calc avatar May 13 '25 19:05 pas-calc

@pas-calc this is an interesting idea, but we've had many discussions about not doing this because KS4 ( K1-3) binary usually points to the temp_wh.dat which can only be used after unwhitening. Phy automatically applies the unwhitening matrix to this data, but we can't guarantee that this is possible because our export to phy function needs to work for all sorters including those that don't whiten. So we don't really encourage using that binary with the sorting analyzer.

I don't really have a problem with returning the params for a read_phy if that would be useful for people. But I don't think we would encourage this specific use case. Can you explain a bit more why you think this is useful (other than fewer lines of code? )

zm711 avatar May 14 '25 12:05 zm711

Thanks @zm711 , I see , you don't recommend to use the dat binary file from params "dat_path" . We could check for the hp_filtered which per https://kilosort.readthedocs.io/en/latest/tutorials/basic_example.html is an indicator of

pre-processed copy of the data (including whitening, high-pass filtering, and drift correction)

(kilosort.run_kilosort option save_preprocessed_copy = True , https://kilosort.readthedocs.io/en/latest/api.html#kilosort.run_kilosort.run_kilosort )

For my cases, using default kilosort config, the params.py generated by kilosort has hp_filtered = False and dat_path points to the same file that I was giving as input to kilosort (the raw recording without any pre-processing). The only difference I see for KS2 is a relative path vs KS4 uses absolute path. And I hope phy uses this as raw data without applying anything on it.

Full params.py :

{'dat_path': 'recording.dat',
 'n_channels_dat': 65,
 'dtype': 'int16',
 'offset': 0,
 'sample_rate': 20000.0,
 'hp_filtered': False}

Two considerations I had:

  • Yes, mainly the code is shorter. It could be even shorter by
# define a folder for KS output containing the params.py file
folder = "kilosort_output_folder"
# read sorting
sorting = se.read_kilosort(folder)

# then we can find the path in any case (relative or absolute)
datfile = os.path.join(folder, params["dat_path"]) # if os.path.isabs(datfile), then just "datfile" will be taken as path (see doc for join)
# and set the recording
recording = se.read_binary(datfile, ....params pars .... )
  • And I thought it could be a nice concept. It could be on the level of SortingAnalyzer like in addition to create_sorting_analyzer also having a function create_sorting_analyzer_from_ksphy_sorting which will read the KS/phy sorting (BasePhyKilosortSortingExtractor) and then creates the associated recording and returns the analyzer which provided analyzer.sorting as defined by user and analyzer.recording as defined by the function.

pas-calc avatar May 14 '25 13:05 pas-calc

Maybe this is a good topic for a how to? How to create a sorting analyzer from phy data or something like this?

There, the recipe can be standardized and the nuances stated for the benefits of people using similar workflows?

h-mayorquin avatar May 14 '25 15:05 h-mayorquin

And I hope phy uses this as raw data without applying anything on it.

unfortunately this cannot be assumed. Phy and Kilosort used to coordinate carefully on this but this has diverged. Phy will always attempt to unwhiten data and if that fails it then will default to unwhiten with an identity matrix. I believe different versions of kilosort have treated the hp_filtered differently and so phy ignores that (although I think it used to fiddle with that--this I haven't dug into too deeply).

The only difference I see for KS2 is a relative path vs KS4 uses absolute path.

This is true. After 2 they switched to an absolute path (so for 2.5, 3, and 4).

For my cases, using default kilosort config, the params.py generated by kilosort has hp_filtered = False

We recommend that you do some preprocessing to a recording before giving it to the analyzer. See #3483 where we had a long discussion about this :) . The idea is that you like want the "raw" recording to be filtered + referenced + motion-corrected. You don't want it whitened. So loading the raw recording would still require some code for preprocessing.

create_sorting_analyzer_from_ksphy_sorting

We make a strong attempt to be sorter agnostic at the spikeinterface level. So although we load the sorting data for all varieties of sorters (that people want) we try not to have extra functions such as create_sorting_analyzer_from_ksphy_sorting since that would be for kilosort-only.

zm711 avatar May 14 '25 15:05 zm711