python-neo icon indicating copy to clipboard operation
python-neo copied to clipboard

ZeroDivisionError - Unable to read an older Micromed file--test file needed

Open theairbend3r opened this issue 9 months ago • 15 comments

Describe the bug Get a ZeroDivisionError when trying to read a micromed .trc file via MicromedIO(file_path).read(). The same file can be read with Wonambi (another python package) as wonambi.Dataset(file_path).

To Reproduce The error traceback is as follows.

Traceback (most recent call last):
  File ".venv/lib/python3.11/site-packages/marimo/_runtime/executor.py", line 141, in execute_cell
    exec(cell.body, glbls)
    ....

    trc = MicromedIO(file_path).read()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.11/site-packages/neo/io/micromedio.py", line 13, in __init__
    BaseFromRaw.__init__(self, filename)
  File ".venv/lib/python3.11/site-packages/neo/io/basefromrawio.py", line 77, in __init__
    self.parse_header()
  File ".venv/lib/python3.11/site-packages/neo/rawio/baserawio.py", line 211, in parse_header
    self._parse_header()
  File ".venv/lib/python3.11/site-packages/neo/rawio/micromedrawio.py", line 176, in _parse_header
    self._t_starts.append(seg_start / self._sampling_rate)
                          ~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero

Expected behaviour The file should have been read. Based on the error, the ZeroDivisionError maybe caused by self._sampling_rate being 0 when it is read in the MicromedRawIO file as self._sampling_rate = float(np.unique(signal_channels["sampling_rate"])[0]).

But, I am able to access the sampling frequency in Wonambi as follows (it comes out to be 512 btw).

trc = wonambi.Dataset(file_path)
trc.header["s_freq"]

Environment:

  • OS: Linux
  • Python 3.11
  • Neo 0.15.0.dev0 (the latest main branch)
  • NumPy 2.2.5

theairbend3r avatar May 01 '25 11:05 theairbend3r

Two questions @theairbend3r:

does this happen for multiple files or just this one file? could you share the file here or privately so we can fix the raw io? I don't work with micromed often so I would need a file in order to figure this out!

It might be that the file assumptions were changed for a version or something, but we can definitely fix this (unless you want to try to fix it and do a PR :) )?

zm711 avatar May 01 '25 12:05 zm711

Just so I can remember it appears to me if we backtrack the issue:

so we check that the sampling rates are the same across channels here: https://github.com/NeuralEnsemble/python-neo/blob/1dfda7a9ded08653fe5c1a02020edfc71ecf984d/neo/rawio/micromedrawio.py#L163-L165

So neo thinks all your sampling rates are 0. this could happen because: https://github.com/NeuralEnsemble/python-neo/blob/1dfda7a9ded08653fe5c1a02020edfc71ecf984d/neo/rawio/micromedrawio.py#L151

we use the Rate_Min here which if 0 would make each sampling rate 0. We grab this value here:

https://github.com/NeuralEnsemble/python-neo/blob/1dfda7a9ded08653fe5c1a02020edfc71ecf984d/neo/rawio/micromedrawio.py#L67

so I would definitely need the file to see what version it is and see why our previous logic might not hold up any more.

zm711 avatar May 01 '25 12:05 zm711

Hey @zm711,

does this happen for multiple files or just this one file?

multiple files

could you share the file here or privately

I don't think I can do this. Let me check and get back to you. In the meanwhile, I can try to debug why it's happening.

I loaded this package locally and Rate_Min is never 0. f.read_f('H') seems to be the culprit

f.seek(8, 1)
(sampling_rate,) = f.read_f("H")
sampling_rate *= Rate_Min

I print out these values in this for loop for each channel:

for c in range(Num_Chan):
    ...
    print(f"Before seek: {f.tell()}")
    f.seek(8, 1)
    print(f"After seek: {f.tell()}")

    (sampling_rate,) = f.read_f("H")
    print(f"Raw value read: {(f.read_f('H'))}, Current position: {f.tell()}")
    print(f"Extracted sampling rate: {sampling_rate}")

    sampling_rate *= Rate_Min
    print(c, sampling_rate, Rate_Min)
    print("=" * 50)
    ....

I get the following

Before seek: 5284
After seek: 5292
Raw value read: (0,), Current position: 5296
Extracted sampling rate: 0
0 0 512
==================================================
Before seek: 5412
After seek: 5420
Raw value read: (1,), Current position: 5424
Extracted sampling rate: 0
1 0 512
==================================================
Before seek: 5540
After seek: 5548
Raw value read: (2,), Current position: 5552
Extracted sampling rate: 0
2 0 512
==================================================
.
.
.
python-neo/neo/rawio/micromedrawio.py:148: RuntimeWarning: overflow encountered in scalar multiply
  f.seek(pos + code[c] * 128 + 2, 0)
Before seek: 16420
After seek: 16428
Raw value read: (87,), Current position: 16432
Extracted sampling rate: 0
120 0 512
==================================================
Before seek: 16548
After seek: 16556
Raw value read: (88,), Current position: 16560
Extracted sampling rate: 0
121 0 512
==================================================
.
.

Not sure why it is happening, but the sampling_rate in the assignment (sampling_rate,) = f.read_f("H") is always 0. This makes signal_channels = np.array(signal_channels, dtype=_signal_channel_dtype) array full of zeros and therefore causes the eventual error in self._sampling_rate = float(np.unique(signal_channels["sampling_rate"])[0])

theairbend3r avatar May 01 '25 14:05 theairbend3r

Cool that helps us narrow it down. I'll see if I can try to see what is going on with one of our test files, but definitely let me know if you can share a file. We can definitely delete after figuring things out, but we totally understand wanting/needing to keep certain data private. Could you let me know the version of the file? Obtained here https://github.com/NeuralEnsemble/python-neo/blob/1dfda7a9ded08653fe5c1a02020edfc71ecf984d/neo/rawio/micromedrawio.py#L71

zm711 avatar May 01 '25 15:05 zm711

The other thing that you could try for me is make an environment with numpy 1.26. We have been having some overflow issues that are hard to find in different IOs so if this works with numpy 1.26 then that tells me it is a numpy issue and it would reinforce us needing the file to adapt to numpy > 2+ (unfortunately for bandwidth reasons we only test on small files and overflows occur on bigger files so it's hard for us to catch these issues until someone discovers with real data).

zm711 avatar May 01 '25 15:05 zm711

The header version is 4. I tried it out with numpy 1.26 and the overflow warning disappears but the zerodivision error still persists

Let me know if there's any more information I can provide to help debug this!

theairbend3r avatar May 01 '25 15:05 theairbend3r

So based on testing with our own test files it seems to me that the sampling_rate is actually a multiplication factor for the Min_Rate so I think that was not completely in the code previously. For our test files the Min_rate was set to 256 and then the sampling_rate was 1 so it was giving an overall sampling_rate of 256. The two options are that your channels were off and so the code is registering 0 (seems unlikely since you got other code to work ) or the sampling_rate we are checking is not quite right. But since our test data is working as expected this is hard to debug.

Would you be able to record a junk file? I don't need to see your real data, but if you have access to the recording equipment you could put in a fake name etc and just record 1 second of trash data and send that over? That way we can see if this is something with your setup overall. That will also give me access to the problem file style to troubleshoot more. Without something to work with this is going to be tricky.

zm711 avatar May 01 '25 16:05 zm711

Also just to note there's a bug in your debugging script

 (sampling_rate,) = f.read_f("H")
    print(f"Raw value read: {(f.read_f('H'))}, Current position: {f.tell()}")

in this case with the f-string you are running your read_f('H') another time which is messing with the positioning. So you should not print that value so that you don't shift things out of order. We still need to diagnose the problem, but just wanted to make sure people didn't get confused.

zm711 avatar May 01 '25 17:05 zm711

Would you be able to record a junk file? I don't need to see your real data, but if you have access to the recording equipment you could put in a fake name etc and just record 1 second of trash data and send that over?

I asked around and we no longer have access to the equipment used to record this data --- so a test recording would not be possible and the data is not allowed to be shared externally either unfortunately.

in this case with the f-string you are running your read_f('H') another time which is messing with the positioning. So you should not print that value so that you don't shift things out of order.

Ah yes, good catch!


For a file with no errors, f.read_f("H")[0] and struct.unpack("H", f.read(struct.calcsize("H")))[0] both print out as 1 and give no overflow warnings with numpy 2.2.

for loop:
    ...
    (sampling_rate,) = f.read_f("H")
    # sampling_rate = struct.unpack("H", f.read(struct.calcsize("H")))[0] 
    print(f"c = {c}, {sampling_rate}")
    ...


-- output --

c = 0, 1
c = 1, 1
c = 2, 1
...

But for this problematic file, both f.read_f("H")[0] and struct.unpack("H", f.read(struct.calcsize("H")))[0] both print out as 0 and give the overflow warning with numpy 2.2.

c = 0, 0
c = 1, 0
c = 2, 0
...

Do you have any other ideas I could explore to debug this on my end maybe?

theairbend3r avatar May 07 '25 13:05 theairbend3r

Could you try and print struct.calcsize('H') I just want to doublecheck that the int size is always 2 bytes? Could you give me any metadata about the problem files (are they super old or super new? I'm trying to see if this is something that we didn't know about or something that has changed?

what I'm referring to is this (https://docs.python.org/3/library/struct.html) where 'H' should be 2 bytes but the int represented by 'i' is 4 bytes. I'm wondering if there was a format change somewhere where we are using the wrong size now...

I'm editing this a bunch....

the other thing you could try is explicitly setting format=None in the read_f function. Maybe it is setting some sort of offset in appropriately.

I really don't see why any of these things should happen so most importantly we need to know everything that you're allowed to tell us about this file.

zm711 avatar May 07 '25 14:05 zm711

Could you try and print struct.calcsize('H') I just want to doublecheck that the int size is always 2 bytes?

Yep, always 2

the other thing you could try is explicitly setting format=None in the read_f function. Maybe it is setting some sort of offset in appropriately.

Do you mean setting the offset=None right because there's no parameter called format in this function. I set offset=None but no difference

Could you give me any metadata about the problem files (are they super old or super new?

This is a super old file from ~2009. I checked another file from ~2014 and it loads fine. Maybe this helps?

I checked wonambi's loader and it seems to something similar as well

theairbend3r avatar May 07 '25 14:05 theairbend3r

Do you mean setting the offset=None right because there's no parameter called format in this function. I set offset=None but no difference

yeah sorry :)

This is a super old file from ~2009. I checked another file from ~2014 and it loads fine. Maybe this helps?

yeah that helps. We made this reader ~2017 (based on git history) so they likely changed their format in some way.

I checked wonambi's loader and it seems to something similar as well

you mean it fails? (I just want to be clear)

If wonambi fails as well then I think the issue is that the file format changed. For file format changes we would need test files, so I think we would be out of luck because we would honestly need to analyze byte by byte to figure this out (or if you could request a spec sheet from the company we could build off of the spec sheet).

zm711 avatar May 07 '25 15:05 zm711

yeah that helps. We made this reader ~2017 (based on git history) so they likely changed their format in some way.

Ah, okay yeah then it makes sense that a really old file does not work

you mean it fails? (I just want to be clear)

No, Wonambi works. I am able to load the file in wonambi by wonambi.Dataset('path').read_data().data. I do get a runtime warning (RuntimeWarning: overflow encountered in scalar multiply f.seek(pos + i_ch * CHAN_LENGTH, 0)) but the array loads

For file format changes we would need test files, so I think we would be out of luck because we would honestly need to analyze byte by byte to figure this out (or if you could request a spec sheet from the company we could build off of the spec sheet).

Yeah, that's understandable. It seems unlikely but I will let you know if I am able to get my hands on a spec sheet

theairbend3r avatar May 07 '25 15:05 theairbend3r

I tried changing the title so hopefully if some other group has a file they can share and they stumble on this they can know we need test files. If I get some time to read Wonambi (if it's fully open source) I'll see if I can find out they handle this.

zm711 avatar May 07 '25 15:05 zm711

Actually it's interesting because they are doing the same thing we are (- using the read_f in the class) see here So the rate coefficient is being read with an `unpack.('H'). Let me try a PR and just see what happens on our test suite.

zm711 avatar May 07 '25 16:05 zm711