Unclosed file after opening the recording
Hi !
I recently have had an error when opening the binary file :
sys:1: ResourceWarning: unclosed file <_io.BufferedReader name='my\\dat\\file.dat'>
I used tramalloc to identify where the file was being opened and it is here:
class BinaryRecordingSegment(BaseRecordingSegment):
def __init__(self, file_path, sampling_frequency, t_start, num_channels, dtype, time_axis, file_offset):
BaseRecordingSegment.__init__(self, sampling_frequency=sampling_frequency, t_start=t_start)
self.num_channels = num_channels
self.dtype = np.dtype(dtype)
self.file_offset = file_offset
self.time_axis = time_axis
self.file_path = file_path
self.file = open(self.file_path, "rb")
self.bytes_per_sample = self.num_channels * self.dtype.itemsize
self.data_size_in_bytes = Path(file_path).stat().st_size - file_offset
self.num_samples = self.data_size_in_bytes // self.bytes_per_sample
It is opening the file self.file = open(self.file_path, "rb"), but never close it, could it be an issue? I never had this error before so I don't know what led to it. Am I missing something?
I've been trying to track these instances. It would be helpful if you could also indicate your OS/if you're working from a server.
There are a couple places where we don't close files. Unfortunately sometimes they are memmaps and numpy.memmaps offer no guarantees that they can be closed. So if we know as much info as possible about what type of file you were trying to write that would also be useful!
I have Windows 11 Pro for Workstation, and I first opened the recording saved as a .dat file (from intan), did the lazy recording, and then when launching the sorting, I am getting this error that the file is unclosed.
Okay that's good to know. Windows file management is a bit harder to deal with than Mac/Linux. (I know our workstations are also Windows). Does it give any windows errors or just the sys error?
Only the sys error
It is opening the file
self.file = open(self.file_path, "rb"), but never close it, could it be an issue?
All reader are opening a file.. The file is close when you detsroy the object (the recording) itself. We have this by design everywhere.
This should not be a problem as long as you do no try to write the same file again while the provious recording is not yet destroy.
Very often a simple del rec resolve the issue.
My understanding was the problem was the recording being fed into run_sorter such that when we do the_load_recording_from_folder was failing due to the recording not being closed. If that's the case we the user can't del. @OlivierPeron could you post your script so we can better figure when the issue is occurring.
My understanding was the problem was the
recordingbeing fed intorun_sortersuch that when we do the_load_recording_from_folderwas failing due to the recording not being closed. If that's the case we the user can'tdel. @OlivierPeron could you post your script so we can better figure when the issue is occurring.
Yes, this is exactly what is happening. My script is basically:
recording = si.read_binary(...)
filtered_recording = spre.filter(...)
# Define groups per probe
filtered_recording.set_property('group', ...)
# Splitting by probe
recordings = filtered_recording.split_by('group')
pp_recs = [
for rec in recordings:
rec_cmr = spre.common_reference(...)
pp_recs.append(rec_cmr)
merged_rec = si.aggregate_channels(pp_recs)
merged_rec.set_probegroup(...)
sorting = ss.run_sorter_by_property(merged_rec, 'group', ...)
I've never had to do this but you could try once to do:
del recording, filtered_recording, recordings, rec_cmr, pp_recs
right before you run ss.run_sorter_by_property
that way you only have reference to the final recording object (in your case merged_rec). The way Windows holds onto files can be such a mess that sometimes these issues are hard to debug and it is best to just cleanup everything except what you are using. If you're able to reproduce the error even after cleaning up the other references, then my idea would be to share the files with me, since I also have a windows work station I can see if I can reproduce on mine (Windows 10 at the moment) and then we could strategize about ways to fix this long-term on windows. But if doing the del cleans it then I would just add that to your scripts until the error comes back.
EDIT: I made it ugly.. Oops. should render better now.
My preprocessing steps are done in a function (not shown here), so the different recordings shouldn't be in memory. Also, the sorting is done in another function. But will try and let you know !
Related:
https://github.com/SpikeInterface/spikeinterface/pull/3296#issuecomment-2301498787
After a few test on deleting the different recordings, I still have the same error. I also tried to save it to open again properly, but the same error again.
Cool, @samuelgarcia @h-mayorquin what do we think about this? We need to balance the cost of opening a remote file vs the fact that Windows is suboptimal at letting users interact with open files? I honestly don't understand this because it only happens on certain work stations so it hard to say this is a Windows issue vs a Windows settings issue. @OlivierPeron do you always run your terminal in admin mode? And which terminal do you use?
I support opening and closing connections within get_traces to prevent leaving anything hanging. The issue is that I don't have the infrastructure to assess the cost of opening and closing in a network environment, since the cost is negligible locally:
Yeah I think it is tough to deal with. But if this file issue prevents windows users from using the program then you would take a slow working program vs a function that error every time you try to use it? Worst case scenario we throttle Windows users by making this os specific?
I don't understand, I am advocating for the opening and closing on get_traces which has not negligible cost for most users which should work for all OSes. Maybe I a missing something.
I'm saying that if Sam says the network open/close is too expensive for time then maybe since Windows only works with the close (for some stations/settings) then we don't close for speed on Mac/Linx, but close on Windows. Because slow but working is better than not working at all. I agree locally probably not a problem.
Ah, I see, you were addressing Sam's concerns. Sorry for the confusion.
Meanwhile maybe this will help:
#3833
Cool, @samuelgarcia @h-mayorquin what do we think about this? We need to balance the cost of opening a remote file vs the fact that Windows is suboptimal at letting users interact with open files? I honestly don't understand this because it only happens on certain work stations so it hard to say this is a Windows issue vs a Windows settings issue. @OlivierPeron do you always run your terminal in admin mode? And which terminal do you use?
Using the Anaconda prompt. I tried running as admin and the same issue occurs.
@OlivierPeron,
did you happen to retest your script with Heberto's addition to main? It's basically impossible for us to write a test for the dunder del so having a real user test it would be great. We are still trying to think of the best way to deal with this to balance Windows vs network associated stuff. If this becomes too big a problem you could also run this through wsl which doesn't have the same restrictions on working with files that Windows has.
I haven't yet, I will ! So far was trying to avoid this issue in order to be able to spikesort ! Will let you know !