OMOP tutorialΒΆ
This notebook shows examples of how to use the cycquery tool to query EHR databases that follow the OMOP common data model. Each query is limit to 100 rows (for quick results).
We showcase the examples on:
Synthea in OMOP format.
First, generate synthea data using their releases. We used v2.7.0 to generate data .
Follow instructions provided in ETL-Synthea to load the CSV data into a postgres database, and perform ETL to load the data into OMOP format.
Imports and instantiate OMOPQuerier
.ΒΆ
Pass in the schema_name
which is the name of the postgres schema which houses all the OMOP tables.
[1]:
"""OMOP tutorial."""
import pandas as pd
import cycquery.ops as qo
from cycquery import OMOPQuerier
querier = OMOPQuerier(
dbms="postgresql",
port=5432,
host="localhost",
database="synthea_integration_test",
user="postgres",
password="pwd",
schema_name="cdm_synthea10",
)
# List all tables.
querier.list_tables("cdm_synthea10")
2025-01-29 09:50:50,170 INFO cycquery.orm - Database setup, ready to run queries!
[1]:
['cdm_synthea10.all_visits',
'cdm_synthea10.assign_all_visit_ids',
'cdm_synthea10.care_site',
'cdm_synthea10.cdm_source',
'cdm_synthea10.cohort',
'cdm_synthea10.cohort_definition',
'cdm_synthea10.concept',
'cdm_synthea10.concept_ancestor',
'cdm_synthea10.concept_class',
'cdm_synthea10.concept_relationship',
'cdm_synthea10.concept_synonym',
'cdm_synthea10.condition_era',
'cdm_synthea10.condition_occurrence',
'cdm_synthea10.cost',
'cdm_synthea10.death',
'cdm_synthea10.device_exposure',
'cdm_synthea10.domain',
'cdm_synthea10.dose_era',
'cdm_synthea10.drug_era',
'cdm_synthea10.drug_exposure',
'cdm_synthea10.drug_strength',
'cdm_synthea10.episode',
'cdm_synthea10.episode_event',
'cdm_synthea10.fact_relationship',
'cdm_synthea10.final_visit_ids',
'cdm_synthea10.location',
'cdm_synthea10.measurement',
'cdm_synthea10.metadata',
'cdm_synthea10.note',
'cdm_synthea10.note_nlp',
'cdm_synthea10.observation',
'cdm_synthea10.observation_period',
'cdm_synthea10.payer_plan_period',
'cdm_synthea10.person',
'cdm_synthea10.procedure_occurrence',
'cdm_synthea10.provider',
'cdm_synthea10.relationship',
'cdm_synthea10.source_to_concept_map',
'cdm_synthea10.source_to_source_vocab_map',
'cdm_synthea10.source_to_standard_vocab_map',
'cdm_synthea10.specimen',
'cdm_synthea10.visit_detail',
'cdm_synthea10.visit_occurrence',
'cdm_synthea10.vocabulary']
Example 1. Get all patient visits in or after 2010.ΒΆ
[2]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits = visits.run(limit=100)
print(f"{len(visits)} rows extracted!")
pd.to_datetime(visits["visit_start_date"]).dt.year.value_counts().sort_index()
2025-01-29 09:50:51,479 INFO cycquery.orm - Query returned successfully!
2025-01-29 09:50:51,481 INFO cycquery.utils.profile - Finished executing function run_query in 0.078109 s
100 rows extracted!
[2]:
visit_start_date
2010 3
2011 2
2012 2
2013 5
2014 11
2015 6
2016 10
2017 8
2018 15
2019 10
2020 8
2021 10
2022 6
2023 4
Name: count, dtype: int64
Example 2. Get measurements for all visits in or after 2020.ΒΆ
[3]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2020-01-01"))
measurements = querier.measurement()
visits_measurements = visits.join(
join_table=measurements,
on="visit_occurrence_id",
).run(limit=100)
print(f"{len(visits_measurements)} rows extracted!")
2025-01-29 09:50:52,449 INFO cycquery.orm - Query returned successfully!
2025-01-29 09:50:52,451 INFO cycquery.utils.profile - Finished executing function run_query in 0.132927 s
100 rows extracted!
MIMIC-III v1.4 in OMOP format.
First, setup the MIMIC-III database according to the instructions in mimic-code.
Perform the ETL in the mimic-omop repo.
The database is assumed to be hosted using postgres. Update the config parameters such as username and password, passed to
MIMICIIIQuerier
accordingly.
Imports and instantiate OMOPQuerier
.ΒΆ
Pass in the schema_name
which is the name of the postgres schema which houses all the OMOP tables.
[4]:
querier = OMOPQuerier(
dbms="postgresql",
port=5432,
host="localhost",
database="mimiciii",
user="postgres",
password="pwd",
schema_name="omop",
)
# List all schemas.
querier.list_schemas()
2025-01-29 09:50:58,802 INFO cycquery.orm - Database setup, ready to run queries!
[4]:
['information_schema', 'mimiciii', 'omop', 'public']
Example 1. Get all patient visits that ended in a mortality outcome in or after 2010.ΒΆ
[5]:
visits = querier.visit_occurrence()
visits = visits.ops(qo.ConditionAfterDate("visit_start_date", "2010-01-01"))
visits_concept_mapped = querier.map_concept_ids_to_name(
visits,
[
"discharge_to_concept_id",
"admitting_concept_id",
],
)
visits_concept_mapped_died = visits_concept_mapped.ops(
qo.ConditionSubstring("discharge_to_concept_name", "died"),
).run()
print(f"{len(visits_concept_mapped_died)} rows extracted!")
2025-01-29 09:51:05,366 INFO cycquery.orm - Query returned successfully!
2025-01-29 09:51:05,367 INFO cycquery.utils.profile - Finished executing function run_query in 0.711447 s
5815 rows extracted!