MIMIC-IV tutorial¶
This notebook shows examples of how to use the cycquery tool on MIMIC-IV v2.0.
Each query is limit to 100 rows (for quick results).
First, setup the MIMIC-IV database according to the instructions in mimic-code.
The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to
MIMICIVQuerier
accordingly.
Imports and instantiate MIMICIVQuerier
¶
[1]:
"""MIMICIV tutorial."""
import cycquery.ops as qo
from cycquery import MIMICIVQuerier
querier = MIMICIVQuerier(
dbms="postgresql",
port=5432,
host="localhost",
database="mimiciv-2.0",
user="postgres",
password="pwd",
)
# List all schemas.
querier.list_schemas()
2024-11-20 06:18:11,891 INFO cycquery.orm - Database setup, ready to run queries!
[1]:
['fhir_etl',
'fhir_trm',
'information_schema',
'mimic_fhir',
'mimiciv_derived',
'mimiciv_ed',
'mimiciv_hosp',
'mimiciv_icu',
'mimiciv_note',
'public']
Example 1. Get all patient admissions from 2021 or later (approx year of admission)¶
[2]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionAfterDate("admittime", "2021-01-01"),
)
patient_admissions = patient_admissions.ops(ops).run(limit=100)
print(f"{len(patient_admissions)} rows extracted!")
2024-11-20 06:18:14,622 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:18:14,624 INFO cycquery.utils.profile - Finished executing function run_query in 0.391459 s
100 rows extracted!
Example 2. Get all patient encounters with diagnoses (schizophrenia
in ICD-10 long title), in the year 2015.¶
[3]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
diagnoses = querier.diagnoses_icd()
diagnoses_ops = qo.Sequential(
qo.ConditionEquals("icd_version", 10),
qo.ConditionSubstring("long_title", "schizophrenia"),
)
diagnoses = diagnoses.ops(diagnoses_ops)
patient_admissions_diagnoses = patient_admissions.join(
join_table=diagnoses,
on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_diagnoses)} rows extracted!")
2024-11-20 06:18:17,663 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:18:17,665 INFO cycquery.utils.profile - Finished executing function run_query in 2.965096 s
100 rows extracted!
Example 3. Advanced - uses ConditionRegexMatch
from cycquery.ops
. Get all patient encounters with diagnoses (ICD-9 long title contains schizophrenia
and chronic
), in the year 2015.¶
[4]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
diagnoses = querier.diagnoses_icd()
diagnoses_ops = qo.Sequential(
qo.ConditionEquals("icd_version", 9),
qo.ConditionRegexMatch("long_title", r"(?=.*schizophrenia)(?=.*chronic)"),
)
diagnoses = diagnoses.ops(diagnoses_ops)
patient_admissions_diagnoses = patient_admissions.join(
join_table=diagnoses,
on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_diagnoses)} rows extracted!")
2024-11-20 06:18:21,483 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:18:21,485 INFO cycquery.utils.profile - Finished executing function run_query in 3.748332 s
82 rows extracted!
Example 4. Get routine vital signs for patients from year 2015.¶
[5]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
chart_events = querier.chartevents()
vitals = chart_events.ops(qo.ConditionEquals("category", "Routine Vital Signs"))
patient_admissions_vitals = patient_admissions.join(
join_table=vitals,
on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_vitals)} rows extracted!")
2024-11-20 06:18:25,954 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:18:25,956 INFO cycquery.utils.profile - Finished executing function run_query in 4.399047 s
100 rows extracted!
Example 5. Get hemoglobin lab tests for patients from year 2009.¶
[6]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2009"),
)
patient_admissions = patient_admissions.ops(ops)
chart_events = querier.chartevents()
labs = chart_events.ops(qo.ConditionEquals("label", "hemoglobin"))
patient_admissions_labs = patient_admissions.join(
join_table=labs,
on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_labs)} rows extracted!")
2024-11-20 06:18:33,541 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:18:33,543 INFO cycquery.utils.profile - Finished executing function run_query in 7.514830 s
100 rows extracted!
Example 6. Get radiology reports and filter on keywords lymphadenopathy
and infectious
occurring together from year 2009.¶
[7]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2009"),
)
patient_admissions = patient_admissions.ops(ops)
radiology_notes = querier.mimiciv_note.radiology()
radiology_notes_ops = qo.Sequential(
qo.And(
qo.ConditionLike("text", "% lymphadenopathy %"),
qo.ConditionLike("text", "% infectious %"),
),
)
radiology_notes = radiology_notes.ops(radiology_notes_ops)
patient_admissions_radiology_notes = patient_admissions.join(
join_table=radiology_notes,
on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_radiology_notes)} rows extracted!")
2024-11-20 06:23:27,194 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:23:27,195 INFO cycquery.utils.profile - Finished executing function run_query in 293.589607 s
100 rows extracted!
Example 7. Get all female patient encounters from year 2015, and return batches partitioned and indexed on subject_id
.¶
[8]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
qo.ConditionInYears("admittime", "2015"),
qo.Cast("gender", "str"),
qo.ConditionEquals("gender", "F"),
)
patient_admissions = patient_admissions.ops(ops)
patient_admissions = patient_admissions.run(
index_col="subject_id",
batch_mode=True,
batch_size=1000,
)
patient_admissions_df = next(patient_admissions)
print(f"{len(patient_admissions_df)} rows extracted!")
2024-11-20 06:23:28,413 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:23:28,415 INFO cycquery.utils.profile - Finished executing function run_query in 1.116994 s
999 rows extracted!
Example 8. Running a raw SQL string.¶
[9]:
data = querier.db.run_query("SELECT * FROM mimiciv_hosp.admissions LIMIT 100")
print(f"{len(data)} rows extracted!")
2024-11-20 06:23:28,769 INFO cycquery.orm - Query returned successfully!
2024-11-20 06:23:28,771 INFO cycquery.utils.profile - Finished executing function run_query in 0.013242 s
100 rows extracted!