mimic3wdb-matched RECORDS file hast too many entries

Open tecamenz opened this issue 2 years ago • 1 comments

We are trying to download the mimic3wdb-matched database via wfdb.io.dl_database like so:

wfdb.io.dl_database("mimic3wdb-matched", "mimic3wdb-matched", records='all', annotators='all', keep_subdirs=True, overwrite=False)

After a long wait, we get an error indicating a missing file: wfdb.io._url.NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/3783537_10000.hea

While investigating we found that the corresponding RECORDS file contains more records than there are in the database: https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/RECORDS

RECORDS file:

Actual content:

wfdb.io.dl_database generates unique urls using this RECORDS file which then leads to the mentioned error above.

Some questions:

Can someone adapt the RECORDS file to reflect the database content
The download via wfdb.io.dl_database is excruciating slow. Would it make sens to rewrite wfdb.io.dl_database to use multi-threading? Or what approach do you use to dump the whole database efficiently?

Sep 22 '23 07:09 tecamenz

Thanks for pointing this out. This is not a bug in wfdb-python, it's a bug in the database.

The RECORDS file is (probably) correct; the set of files on PhysioNet is wrong. It looks like some of the files are present in mimic3wdb but were not properly linked into the mimic3wdb-matched directory.

(One may also ask why on earth this record is split into over 10000 tiny segments. I have no idea.)

Sep 29 '23 17:09 bemoody