mimic3wdb-matched RECORDS file hast too many entries
We are trying to download the mimic3wdb-matched database via wfdb.io.dl_database like so:
wfdb.io.dl_database("mimic3wdb-matched", "mimic3wdb-matched", records='all', annotators='all', keep_subdirs=True, overwrite=False)
After a long wait, we get an error indicating a missing file:
wfdb.io._url.NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/3783537_10000.hea
While investigating we found that the corresponding RECORDS file contains more records than there are in the database: https://physionet.org/files/mimic3wdb-matched/1.0/p01/p017488/RECORDS
RECORDS file:
Actual content:
wfdb.io.dl_database generates unique urls using this RECORDS file which then leads to the mentioned error above.
Some questions:
- Can someone adapt the RECORDS file to reflect the database content
- The download via
wfdb.io.dl_databaseis excruciating slow. Would it make sens to rewritewfdb.io.dl_databaseto use multi-threading? Or what approach do you use to dump the whole database efficiently?
Thanks for pointing this out. This is not a bug in wfdb-python, it's a bug in the database.
The RECORDS file is (probably) correct; the set of files on PhysioNet is wrong. It looks like some of the files are present in mimic3wdb but were not properly linked into the mimic3wdb-matched directory.
(One may also ask why on earth this record is split into over 10000 tiny segments. I have no idea.)