OMOP tutorial

This notebook shows examples of how to use the cycquery tool to query EHR databases that follow the OMOP common data model. Each query is limit to 100 rows (for quick results).

We showcase the examples on:

  1. Synthea in OMOP format.

    • First, generate synthea data using their releases. We used v2.7.0 to generate data .

    • Follow instructions provided in ETL-Synthea to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.

Imports and instantiate OMOPQuerier.

Pass in the schema_name which is the name of the postgres schema which houses all the OMOP tables.

[1]:
"""OMOP tutorial."""

import pandas as pd

import cycquery.ops as qo
from cycquery import OMOPQuerier


querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="synthea_integration_test",
    user="postgres",
    password="pwd",
    schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables("cdm_synthea10")
2024-11-20 06:23:35,535 INFO cycquery.orm    - Database setup, ready to run queries!
[1]:
['cdm_synthea10.all_visits',
 'cdm_synthea10.assign_all_visit_ids',
 'cdm_synthea10.care_site',
 'cdm_synthea10.cdm_source',
 'cdm_synthea10.cohort',
 'cdm_synthea10.cohort_definition',
 'cdm_synthea10.concept',
 'cdm_synthea10.concept_ancestor',
 'cdm_synthea10.concept_class',
 'cdm_synthea10.concept_relationship',
 'cdm_synthea10.concept_synonym',
 'cdm_synthea10.condition_era',
 'cdm_synthea10.condition_occurrence',
 'cdm_synthea10.cost',
 'cdm_synthea10.death',
 'cdm_synthea10.device_exposure',
 'cdm_synthea10.domain',
 'cdm_synthea10.dose_era',
 'cdm_synthea10.drug_era',
 'cdm_synthea10.drug_exposure',
 'cdm_synthea10.drug_strength',
 'cdm_synthea10.episode',
 'cdm_synthea10.episode_event',
 'cdm_synthea10.fact_relationship',
 'cdm_synthea10.final_visit_ids',
 'cdm_synthea10.location',
 'cdm_synthea10.measurement',
 'cdm_synthea10.metadata',
 'cdm_synthea10.note',
 'cdm_synthea10.note_nlp',
 'cdm_synthea10.observation',
 'cdm_synthea10.observation_period',
 'cdm_synthea10.payer_plan_period',
 'cdm_synthea10.person',
 'cdm_synthea10.procedure_occurrence',
 'cdm_synthea10.provider',
 'cdm_synthea10.relationship',
 'cdm_synthea10.source_to_concept_map',
 'cdm_synthea10.source_to_source_vocab_map',
 'cdm_synthea10.source_to_standard_vocab_map',
 'cdm_synthea10.specimen',
 'cdm_synthea10.visit_detail',
 'cdm_synthea10.visit_occurrence',
 'cdm_synthea10.vocabulary']

Example 1. Get all patient visits in or after 2010.

[2]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits = visits.run(limit=100)
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()
2024-11-20 06:23:37,787 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:37,789 INFO cycquery.utils.profile - Finished executing function run_query in 0.985569 s
100 rows extracted!
[2]:
visit_start_date
2010     3
2011     2
2012     2
2013     5
2014    11
2015     6
2016    10
2017     8
2018    15
2019    10
2020     8
2021    10
2022     6
2023     4
Name: count, dtype: int64

Example 2. Get measurements for all visits in or after 2020.

[3]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2020-01-01"))
measurements = querier.measurement()
visits_measurements = visits.join(
    join_table=measurements,
    on="visit_occurrence_id",
).run(limit=100)
print(f"{len(visits_measurements)} rows extracted!")
2024-11-20 06:23:40,249 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:40,251 INFO cycquery.utils.profile - Finished executing function run_query in 1.885854 s
100 rows extracted!
  1. MIMIC-III v1.4 in OMOP format.

  • First, setup the MIMIC-III database according to the instructions in mimic-code.

  • Perform the ETL in the mimic-omop repo.

  • The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to MIMICIIIQuerier accordingly.

Imports and instantiate OMOPQuerier.

Pass in the schema_name which is the name of the postgres schema which houses all the OMOP tables.

[4]:
querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciii",
    user="postgres",
    password="pwd",
    schema_name="omop",
)
# List all schemas.
querier.list_schemas()
2024-11-20 06:23:48,129 INFO cycquery.orm    - Database setup, ready to run queries!
[4]:
['information_schema', 'mimiciii', 'omop', 'public']

Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.

[5]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits_concept_mapped = querier.map_concept_ids_to_name(
    visits,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
    ],
)
visits_concept_mapped_died = visits_concept_mapped.ops(
    qo.ConditionSubstring("discharge_to_concept_name", "died"),
).run()
print(f"{len(visits_concept_mapped_died)} rows extracted!")
2024-11-20 06:23:55,566 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:23:55,568 INFO cycquery.utils.profile - Finished executing function run_query in 1.264936 s
5815 rows extracted!

Example 2. Get all measurements for female patient visits with sepsis diagnoses, that ended in a mortality outcome.

[6]:
persons = querier.person()
persons = persons.ops(qo.ConditionSubstring("gender_concept_name", "FEMALE"))
visits = querier.visit_occurrence()
person_visits = persons.join(visits, on="person_id")
conditions = querier.omop.condition_occurrence()
person_visits_conditions = person_visits.join(
    conditions,
    on="visit_occurrence_id",
    isouter=True,
)
measurement = querier.measurement()
person_visits_conditions_measurements = person_visits_conditions.join(
    measurement,
    on="visit_occurrence_id",
    isouter=True,
)
person_visits_conditions_measurements = querier.map_concept_ids_to_name(
    person_visits_conditions_measurements,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
        "condition_concept_id",
    ],
)
ops = qo.Sequential(
    qo.ConditionSubstring("discharge_to_concept_name", "died"),
    qo.ConditionSubstring("condition_concept_name", "sepsis"),
)
cohort = person_visits_conditions_measurements.ops(ops).run(limit=100)
print(f"{len(cohort)} rows extracted!")
2024-11-20 06:32:59,003 INFO cycquery.orm    - Query returned successfully!
2024-11-20 06:32:59,005 INFO cycquery.utils.profile - Finished executing function run_query in 543.173872 s
100 rows extracted!
[7]:
cohort["measurement_concept_name"].value_counts()
[7]:
measurement_concept_name
No matching concept    100
Name: count, dtype: int64[pyarrow]