OMOP tutorialΒΆ

This notebook shows examples of how to use the cycquery tool to query EHR databases that follow the OMOP common data model. Each query is limit to 100 rows (for quick results).

We showcase the examples on:

  1. Synthea in OMOP format.

    • First, generate synthea data using their releases. We used v2.7.0 to generate data .

    • Follow instructions provided in ETL-Synthea to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.

Imports and instantiate OMOPQuerier.ΒΆ

Pass in the schema_name which is the name of the postgres schema which houses all the OMOP tables.

[1]:
"""OMOP tutorial."""

import pandas as pd

import cycquery.ops as qo
from cycquery import OMOPQuerier


querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="synthea_integration_test",
    user="postgres",
    password="pwd",
    schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables("cdm_synthea10")
2025-01-29 09:50:50,170 INFO cycquery.orm    - Database setup, ready to run queries!
[1]:
['cdm_synthea10.all_visits',
 'cdm_synthea10.assign_all_visit_ids',
 'cdm_synthea10.care_site',
 'cdm_synthea10.cdm_source',
 'cdm_synthea10.cohort',
 'cdm_synthea10.cohort_definition',
 'cdm_synthea10.concept',
 'cdm_synthea10.concept_ancestor',
 'cdm_synthea10.concept_class',
 'cdm_synthea10.concept_relationship',
 'cdm_synthea10.concept_synonym',
 'cdm_synthea10.condition_era',
 'cdm_synthea10.condition_occurrence',
 'cdm_synthea10.cost',
 'cdm_synthea10.death',
 'cdm_synthea10.device_exposure',
 'cdm_synthea10.domain',
 'cdm_synthea10.dose_era',
 'cdm_synthea10.drug_era',
 'cdm_synthea10.drug_exposure',
 'cdm_synthea10.drug_strength',
 'cdm_synthea10.episode',
 'cdm_synthea10.episode_event',
 'cdm_synthea10.fact_relationship',
 'cdm_synthea10.final_visit_ids',
 'cdm_synthea10.location',
 'cdm_synthea10.measurement',
 'cdm_synthea10.metadata',
 'cdm_synthea10.note',
 'cdm_synthea10.note_nlp',
 'cdm_synthea10.observation',
 'cdm_synthea10.observation_period',
 'cdm_synthea10.payer_plan_period',
 'cdm_synthea10.person',
 'cdm_synthea10.procedure_occurrence',
 'cdm_synthea10.provider',
 'cdm_synthea10.relationship',
 'cdm_synthea10.source_to_concept_map',
 'cdm_synthea10.source_to_source_vocab_map',
 'cdm_synthea10.source_to_standard_vocab_map',
 'cdm_synthea10.specimen',
 'cdm_synthea10.visit_detail',
 'cdm_synthea10.visit_occurrence',
 'cdm_synthea10.vocabulary']

Example 1. Get all patient visits in or after 2010.ΒΆ

[2]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits = visits.run(limit=100)
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()
2025-01-29 09:50:51,479 INFO cycquery.orm    - Query returned successfully!
2025-01-29 09:50:51,481 INFO cycquery.utils.profile - Finished executing function run_query in 0.078109 s
100 rows extracted!
[2]:
visit_start_date
2010     3
2011     2
2012     2
2013     5
2014    11
2015     6
2016    10
2017     8
2018    15
2019    10
2020     8
2021    10
2022     6
2023     4
Name: count, dtype: int64

Example 2. Get measurements for all visits in or after 2020.ΒΆ

[3]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2020-01-01"))
measurements = querier.measurement()
visits_measurements = visits.join(
    join_table=measurements,
    on="visit_occurrence_id",
).run(limit=100)
print(f"{len(visits_measurements)} rows extracted!")
2025-01-29 09:50:52,449 INFO cycquery.orm    - Query returned successfully!
2025-01-29 09:50:52,451 INFO cycquery.utils.profile - Finished executing function run_query in 0.132927 s
100 rows extracted!
  1. MIMIC-III v1.4 in OMOP format.

  • First, setup the MIMIC-III database according to the instructions in mimic-code.

  • Perform the ETL in the mimic-omop repo.

  • The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to MIMICIIIQuerier accordingly.

Imports and instantiate OMOPQuerier.ΒΆ

Pass in the schema_name which is the name of the postgres schema which houses all the OMOP tables.

[4]:
querier = OMOPQuerier(
    dbms="postgresql",
    port=5432,
    host="localhost",
    database="mimiciii",
    user="postgres",
    password="pwd",
    schema_name="omop",
)
# List all schemas.
querier.list_schemas()
2025-01-29 09:50:58,802 INFO cycquery.orm    - Database setup, ready to run queries!
[4]:
['information_schema', 'mimiciii', 'omop', 'public']

Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.ΒΆ

[5]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits_concept_mapped = querier.map_concept_ids_to_name(
    visits,
    [
        "discharge_to_concept_id",
        "admitting_concept_id",
    ],
)
visits_concept_mapped_died = visits_concept_mapped.ops(
    qo.ConditionSubstring("discharge_to_concept_name", "died"),
).run()
print(f"{len(visits_concept_mapped_died)} rows extracted!")
2025-01-29 09:51:05,366 INFO cycquery.orm    - Query returned successfully!
2025-01-29 09:51:05,367 INFO cycquery.utils.profile - Finished executing function run_query in 0.711447 s
5815 rows extracted!