MIMIC-IV tutorial

This notebook shows examples of how to use the cycquery tool on MIMIC-IV v2.0.

Each query is limit to 100 rows (for quick results).

  • First, setup the MIMIC-IV database according to the instructions in mimic-code.

  • The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to MIMICIVQuerier accordingly.

Imports and instantiate MIMICIVQuerier

[1]:
"""MIMICIV tutorial."""

import cycquery.ops as qo
from cycquery import MIMICIVQuerier


querier = MIMICIVQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciv-2.0",
    user="postgres",
    password="pwd",
)
# List all schemas.
querier.list_schemas()
2024-11-20 06:18:11,891 INFO cycquery.orm    - Database setup, ready to run queries!
[1]:
['fhir_etl',
 'fhir_trm',
 'information_schema',
 'mimic_fhir',
 'mimiciv_derived',
 'mimiciv_ed',
 'mimiciv_hosp',
 'mimiciv_icu',
 'mimiciv_note',
 'public']

Example 1. Get all patient admissions from 2021 or later (approx year of admission)

[2]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionAfterDate("admittime", "2021-01-01"),
)
patient_admissions = patient_admissions.ops(ops).run(limit=100)
print(f"{len(patient_admissions)} rows extracted!")
2024-11-20 06:18:14,622 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:18:14,624 INFO cycquery.utils.profile - Finished executing function run_query in 0.391459 s
100 rows extracted!

Example 2. Get all patient encounters with diagnoses (schizophrenia in ICD-10 long title), in the year 2015.

[3]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
diagnoses = querier.diagnoses_icd()
diagnoses_ops = qo.Sequential(
    qo.ConditionEquals("icd_version", 10),
    qo.ConditionSubstring("long_title", "schizophrenia"),
)
diagnoses = diagnoses.ops(diagnoses_ops)
patient_admissions_diagnoses = patient_admissions.join(
    join_table=diagnoses,
    on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_diagnoses)} rows extracted!")
2024-11-20 06:18:17,663 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:18:17,665 INFO cycquery.utils.profile - Finished executing function run_query in 2.965096 s
100 rows extracted!

Example 3. Advanced - uses ConditionRegexMatch from cycquery.ops. Get all patient encounters with diagnoses (ICD-9 long title contains schizophrenia and chronic ), in the year 2015.

[4]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
diagnoses = querier.diagnoses_icd()
diagnoses_ops = qo.Sequential(
    qo.ConditionEquals("icd_version", 9),
    qo.ConditionRegexMatch("long_title", r"(?=.*schizophrenia)(?=.*chronic)"),
)
diagnoses = diagnoses.ops(diagnoses_ops)
patient_admissions_diagnoses = patient_admissions.join(
    join_table=diagnoses,
    on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_diagnoses)} rows extracted!")
2024-11-20 06:18:21,483 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:18:21,485 INFO cycquery.utils.profile - Finished executing function run_query in 3.748332 s
82 rows extracted!

Example 4. Get routine vital signs for patients from year 2015.

[5]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2015"),
)
patient_admissions = patient_admissions.ops(ops)
chart_events = querier.chartevents()
vitals = chart_events.ops(qo.ConditionEquals("category", "Routine Vital Signs"))
patient_admissions_vitals = patient_admissions.join(
    join_table=vitals,
    on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_vitals)} rows extracted!")
2024-11-20 06:18:25,954 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:18:25,956 INFO cycquery.utils.profile - Finished executing function run_query in 4.399047 s
100 rows extracted!

Example 5. Get hemoglobin lab tests for patients from year 2009.

[6]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2009"),
)
patient_admissions = patient_admissions.ops(ops)
chart_events = querier.chartevents()
labs = chart_events.ops(qo.ConditionEquals("label", "hemoglobin"))
patient_admissions_labs = patient_admissions.join(
    join_table=labs,
    on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_labs)} rows extracted!")
2024-11-20 06:18:33,541 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:18:33,543 INFO cycquery.utils.profile - Finished executing function run_query in 7.514830 s
100 rows extracted!

Example 6. Get radiology reports and filter on keywords lymphadenopathy and infectious occurring together from year 2009.

[7]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2009"),
)
patient_admissions = patient_admissions.ops(ops)
radiology_notes = querier.mimiciv_note.radiology()
radiology_notes_ops = qo.Sequential(
    qo.And(
        qo.ConditionLike("text", "% lymphadenopathy %"),
        qo.ConditionLike("text", "% infectious %"),
    ),
)
radiology_notes = radiology_notes.ops(radiology_notes_ops)
patient_admissions_radiology_notes = patient_admissions.join(
    join_table=radiology_notes,
    on=["subject_id", "hadm_id"],
).run(limit=100)
print(f"{len(patient_admissions_radiology_notes)} rows extracted!")
2024-11-20 06:23:27,194 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:27,195 INFO cycquery.utils.profile - Finished executing function run_query in 293.589607 s
100 rows extracted!

Example 7. Get all female patient encounters from year 2015, and return batches partitioned and indexed on subject_id.

[8]:
patients = querier.patients()
admissions = querier.mimiciv_hosp.admissions()
patient_admissions = patients.join(admissions, on="subject_id")
ops = qo.Sequential(
    qo.AddDeltaColumn(["admittime", "dischtime"], years="anchor_year_difference"),
    qo.ConditionInYears("admittime", "2015"),
    qo.Cast("gender", "str"),
    qo.ConditionEquals("gender", "F"),
)
patient_admissions = patient_admissions.ops(ops)
patient_admissions = patient_admissions.run(
    index_col="subject_id",
    batch_mode=True,
    batch_size=1000,
)
patient_admissions_df = next(patient_admissions)
print(f"{len(patient_admissions_df)} rows extracted!")
2024-11-20 06:23:28,413 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:28,415 INFO cycquery.utils.profile - Finished executing function run_query in 1.116994 s
999 rows extracted!

Example 8. Running a raw SQL string.

[9]:
data = querier.db.run_query("SELECT * FROM mimiciv_hosp.admissions LIMIT 100")
print(f"{len(data)} rows extracted!")
2024-11-20 06:23:28,769 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:28,771 INFO cycquery.utils.profile - Finished executing function run_query in 0.013242 s
100 rows extracted!