mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

Building a db file on a subset of the MIMIC-III data

Open pshuwei opened this issue 2 years ago • 5 comments

Prerequisites

  • [X ] Put an X between the brackets on this line if you have done all of the following:
    • Checked the online documentation: https://mimic.mit.edu/
    • Checked that your issue isn't already addressed: https://github.com/MIT-LCP/mimic-code/issues?utf8=%E2%9C%93&q=

Description

Description of the issue, including:

  • what you have tried I have successfully managed to run and build a mimic db file through the shell program for sqlite.

I am just curious if these codes can be run to build a db file on a subset of the data.

Thanks!

pshuwei avatar Jul 05 '23 15:07 pshuwei

What format do you expect a db file to be?

alistairewj avatar Jul 05 '23 18:07 alistairewj

What format do you expect a db file to be?

To clarify, I have managed to take the shell program that compiles all csv.gz files into a single SQLite database file. I was just wondering if I could do the same thing, but with let's say 10% of the MIMIC-III patients, or any fraction of the dataset.

pshuwei avatar Jul 05 '23 18:07 pshuwei

Yes for sure! You can run the same code using the demo dataset: https://physionet.org/content/mimiciii/

That would give you a 100 patient subset.

alistairewj avatar Jul 05 '23 19:07 alistairewj

Hi thanks for your response,

Does this MIMIC demo dataset contain the same amount of information for 100 patients, or is it simply a condensed version?

Also what if I wanted to increase from 100 to 200 patients? How would I go about that?

pshuwei avatar Jul 11 '23 14:07 pshuwei

The demo dataset is simply a filter on all the tables in the database, requiring the subject_id to be in a list of 100 apriori selected subject_id. We also remove the noteevents table.

You can easily recreate this if you have the full dataset and expand the subject_id list.

alistairewj avatar Jul 11 '23 15:07 alistairewj