pyedflib icon indicating copy to clipboard operation
pyedflib copied to clipboard

EDF open fails when physical max and min are both 0

Open adammj opened this issue 1 year ago • 5 comments

Line 214 of edflib.c tests for the following condition edfhdr->edfparam[i].phys_max==edfhdr->edfparam[i].phys_min), which then throws an error.

I understand in a technical sense that this shouldn't be possible. However, this is the real world, and I have thousands of EDF files (recorded by various groups) that forgot to set the physical max+min for 1 or more channels (they are both set to 0). This leads to the condition that this EDF library will refuse to open those files, even though only 1 or 2 channels are "bad". However, some others, like the native one in MATLAB and some EDF viewers, will gladly open these same files.

I'd like to propose that instead the file still be opened (maybe in all cases, but certainly in the case that both physical min and max are 0). Looking over the code elsewhere, I don't see any instances where this would cause new issues. It's just that reading back the physical values for the "bad" channel would all be 0s. Whereas reading the digital values will still report whatever was actually stored.

Another option, which I think would require more work, is to modify the initial load to act like those channels don't exist.

If this is acceptable, I can certainly make the changes.

adammj avatar Jun 17 '24 18:06 adammj

That's a good idea, thanks!

Yes, there are quite a couple of things that the original C library is quite strict on, not only this, see e.g. https://github.com/holgern/pyedflib/pull/114 , however we are only a wrapper for edflib

However, the more additions we include and monkey-patch in edflib.c, the harder it get's to keep up with upstream changes. It was already a pain to update to the most recent version, which is already a few years out of date.

On the long run it would be great to keep the edflib.c as is from upstream and insert all additions as patches elsewhere, however, I haven't looked into what would be the best approach for that.

Do you have suggestions on how to solve this?

skjerns avatar Jun 18 '24 05:06 skjerns

I noticed the upstream library after I submitted this, and have already heard back from the developer that they won't allow these sorts of compromises. I guess they haven't heard of Postel's law ("be conservative in what you do, be liberal in what you accept from others").

Regardless, now that I understand your desire to minimize work in keeping up with upstream changes, I'll mull it over and see if I can come up with a better solution than the one I currently implemented on my local copy.

I'll get back with you.

adammj avatar Jun 19 '24 01:06 adammj

Yes, teuniz is quite strict with implementation specifics, but I guess it has it's advantages.

If you find a way to implement changes without altering the C code that would be great! I'm not very proficient at C unfortunately and I haven't gotten around to look into options on how to monkey-patch or overload certain functions. That would be very useful for other things as well and for long-term maintainability.

skjerns avatar Jun 19 '24 06:06 skjerns

Unfortunately, I don't see a way around altering the C code (which I have experience with). Looking at the teuniz's C and Python code closer, and testing it against my collection of 30k EDF files, I've found an additional half dozen+ cases distributed over 4k files which also fail. It's obvious that the code is way too strict in what it allows (and their "solution" of providing helper utilities to rewrite the EDF files is not acceptable where strict provenance is required—which should be all of science).

adammj avatar Jun 21 '24 01:06 adammj

Thanks!

As far as my research goes, using macros to redirect the function calls could work?

Eg


#define foo my_foo  // Redirect foo to my_foo

#include "edflib.h"  // Include the original header

#undef foo  // Remove the macro redirection to avoid pollution

// Your code that calls foo()
int main() {
    int result = foo(10);  // This will call my_foo(10)
    return 0;
}

skjerns avatar Jun 21 '24 07:06 skjerns

Could you not just set the phys_min to 0 and (if phys_max is also 0), the phys_max to 0.000001? Then the loading will continue and the data will still be all 0.

In most of the cases these channels probably contain 0's anyway? (reason for faulty phys_min and phys_max)

TrianecT-Wouter avatar Dec 16 '24 23:12 TrianecT-Wouter

Then the loading will continue and the data will still be all 0.

In most of the cases these channels probably contain 0's anyway? (reason for faulty phys_min and phys_max)

No, this is not the case. The "problem" is whatever tool wrote the EDF files set the physical min and max as 0, when there is still valid data, probably because the conversion factor from digital to physical wasn't set. This is a shortcoming of the EDF format.

adammj avatar Dec 17 '24 01:12 adammj

No, this is not the case. The "problem" is whatever tool wrote the EDF files set the physical min and max as 0, when there is still valid data, probably because the conversion factor from digital to physical wasn't set. This is a shortcoming of the EDF format.

quick fix would be to edit the raw header of the specific file and replace the value with a valid one, should not be too difficult, as it's always at the same position..

Until then, I'll close this issue as we will not implement this in the forseeable future

skjerns avatar Jan 23 '25 13:01 skjerns