Dataset
How to get CINC2021 dataset? How to download dataset from url you provided in benchmarks. I could not find prepare_dataset.py but I found it from original repo.
Just call the download method. And of course, you may also download the zip files from google cloud using some other tools and uncompress them manually. The prepare_dataset function in the original repo was created since I had to keep the files in specific subfolders to maintain the paths. The _ls_rec method was updated and the paths are maintained in a pandas DataFrame now, so moving files in the prepare_dataset function is unnecessary and thus removed.
I downloaded dataset Cinc 2021 from https://physionet.org/content/challenge-2021/#files . I want to run trainer.py from benchmarks/cinc2021. I also added ds_train and ds val. `
TrainCfg.db_dir = 'data/CINC2021/physionet.org/files/challenge-2021/1.0.3/training/'
ds_train = CINC2021(TrainCfg, training=True, lazy=True)
ds_val = CINC2021(TrainCfg, training=False, lazy=True)
`
I am getting below error:
File "trainer.py", line 423, in
It's a typo in this file, which happened perhaps when doing copy-paste (from torch_ecg/databases/datasets/cinc2021/cinc2021_dataset.py). The right bracket of this len function was missing, and was added at a wrong place (perhaps by Copilot?). It is now corrected in 20203caab4945994bfff6df7df702b1656406600.
Hi, I'm trying to run trainer.py for train_hybrid_cpsc2020. I have downloaded the CPSC 2020 dataset and specified the data path inside cfg.py like this: BaseCfg.db_dir = 'D:/AUT/Data_Lab/Implementation/TinyML/data/TrainingSet/' TrainingSet contains two subfolders, namely data and ref in which exist 10 .mat files. but come across this error whenever I run trainer.py
File "C:\Users\AK\miniconda3\envs\cpsc\Lib\site-packages\torch\utils\data\dataloader.py", line 350, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore[arg-type] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\AK\miniconda3\envs\cpsc\Lib\site-packages\torch\utils\data\sampler.py", line 143, in init raise ValueError(f"num_samples should be a positive integer value, but got num_samples={self.num_samples}") ValueError: num_samples should be a positive integer value, but got num_samples=0 Any advice on how can I fix this?
It seems that the data reader did not find the recording files. The CPSC2020 data reader searches for the recordings and annotation files using the following method:
def _ls_rec(self) -> None:
"""Find all records in the database directory
and store them (path, metadata, etc.) in some private attributes.
"""
self._df_records = pd.DataFrame()
n_records = 10
all_records = [f"A{i:02d}" for i in range(1, 1 + n_records)]
self._df_records["path"] = [path for path in self.db_dir.rglob(f"*.{self.rec_ext}") if path.stem in all_records]
self._df_records["record"] = self._df_records["path"].apply(lambda x: x.stem)
self._df_records.set_index("record", inplace=True)
all_annotations = [f"R{i:02d}" for i in range(1, 1 + n_records)]
df_ann = pd.DataFrame()
df_ann["ann_path"] = [path for path in self.db_dir.rglob(f"*.{self.ann_ext}") if path.stem in all_annotations]
df_ann["record"] = df_ann["ann_path"].apply(lambda x: x.stem.replace("R", "A"))
df_ann.set_index("record", inplace=True)
# take the intersection by the index of `df_ann` and `self._df_records`
self._df_records = self._df_records.join(df_ann, how="inner")
if len(self._df_records) > 0:
if self._subsample is not None:
size = min(
len(self._df_records),
max(1, int(round(self._subsample * len(self._df_records)))),
)
self._df_records = self._df_records.sample(n=size, random_state=DEFAULTS.SEED, replace=False)
self._all_records = self._df_records.index.tolist()
self._all_annotations = self._df_records["ann_path"].apply(lambda x: x.stem).tolist()
Theoretically, you can pass any of its parents because the pathlib.Path.rglob is used.
I think I know the reason now. The CPSC2020 dataset uses sliced recordings since the original recordings are fairly long. So, you should call the persistence method first, which takes quite a long time to slice the recordings.
Thank you for your guidance, it seems like training requires a CNN.h5 and a CRNN.h5 file located in signal_processing/ecg_rpeaks_dl_models directory but I only have the corresponding json files. It's worth noting that I've only run trainer.py. Should I do anything before running trainer.py? could you please help me on this one as well?
I added automatic downloading of these models, which you can find in https://opensz.oss-cn-beijing.aliyuncs.com/ICBEB2020/file/CPSC2019-opensource.zip. However, these models were trained with a very older version of Keras. One might have trouble loading these models. I also removed the auto-load of deep learning models in the signal_processing module.
The changes were made in the dev branch currently and will be merged into the master branch soon.