GEMINI tutorial¶
This notebook shows examples of how to use the cycquery tool on GEMINI.
Imports and instantiate GEMINIQuerier
.¶
[1]:
"""GEMINI tutorial."""
import cycquery.ops as qo
from cycquery import GEMINIQuerier
querier = GEMINIQuerier(
host="db.gemini-hpc.ca",
database="delirium_v4_0_1",
user="username",
password="password",
)
# List all tables.
querier.list_tables("public")
2023-09-21 09:17:45,449 INFO cycquery.orm - Database setup, ready to run queries!
[1]:
['public.lookup_icd10_ca_description',
'public.lookup_statcan',
'public.lookup_cci',
'public.lookup_icd10_ca_to_ccsr',
'public.lookup_ip_administrative',
'public.lookup_lab_concept',
'public.lookup_vitals_concept',
'public.lookup_pharmacy_concept',
'public.lookup_diagnosis',
'public.locality_variables',
'public.admdad',
'public.derived_variables',
'public.ipscu',
'public.lookup_phy_characteristics',
'public.lab',
'public.ipintervention',
'public.lookup_ccsr',
'public.lookup_pharmacy_route',
'public.lookup_transfusion_concept',
'public.lookup_ip_scu',
'public.lookup_er_administrative',
'public.lookup_imaging',
'public.pharmacy',
'public.radiology',
'public.lookup_transfer',
'public.ipdiagnosis',
'public.lookup_room_transfer',
'public.er',
'public.erdiagnosis',
'public.erintervention',
'public.roomtransfer',
'public.transfusion',
'public.vitals',
'public.lookup_hospital_num']
Example 1a. Create a table with only one hospitalization per patient, keeping the most recent encounter for each patient. Sort the dataset by patient_id_hashed
and discharge_date_time
, and then keep the recent record.¶
[2]:
ops = qo.Sequential(
qo.OrderBy(
["patient_id_hashed", "discharge_date_time"],
ascending=[True, False],
),
qo.Distinct("patient_id_hashed"),
)
encounters = querier.public.admdad()
encounters = encounters.ops(ops).run()
print(f"{len(encounters)} rows extracted!")
2023-09-21 09:17:51,902 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:51,903 INFO cycquery.utils.profile - Finished executing function run_query in 6.093352 s
189734 rows extracted!
Example 1b. From the above set of encounters, take a subset of patients who were discharged between April 1, 2015 and March 31, 2016.¶
[3]:
ops = qo.Sequential(
qo.Cast("discharge_date_time", "timestamp"),
qo.ConditionAfterDate("discharge_date_time", "2015-04-01"),
qo.ConditionBeforeDate("discharge_date_time", "2016-03-31"),
)
encounters_query = querier.public.admdad()
encounters_query = encounters_query.ops(ops)
encounters = encounters_query.run()
print(f"{len(encounters)} rows extracted!")
2023-09-21 09:17:52,591 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:52,592 INFO cycquery.utils.profile - Finished executing function run_query in 0.675141 s
32567 rows extracted!
Example 1c. From the above set of encounters, get the total number of admissions for each hospital.¶
[4]:
ops = qo.GroupByAggregate("hospital_num", {"hospital_num": ("count", "count")})
encounters_per_site = encounters_query.ops(ops).run()
print(f"{len(encounters_per_site)} rows extracted!")
2023-09-21 09:17:52,856 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:52,857 INFO cycquery.utils.profile - Finished executing function run_query in 0.145693 s
7 rows extracted!
Example 2a. How many sodium tests were placed between Apr 1, 2015 and May 31, 2015 at hospital 101?¶
[5]:
encounters = querier.public.admdad()
encounters = encounters.ops(qo.ConditionEquals("hospital_num", 101))
lab_ops = qo.Sequential(
qo.DropEmpty("collection_date_time"),
qo.Cast("collection_date_time", "timestamp"),
qo.ConditionAfterDate("collection_date_time", "2015-04-01"),
qo.ConditionBeforeDate("collection_date_time", "2015-05-31"),
qo.ConditionSubstring("test_type_mapped", "sodium"),
)
labs = querier.public.lab()
labs = labs.ops(lab_ops)
encounters_labs = encounters.join(labs, on="genc_id")
encounters_labs = encounters_labs.ops(
qo.GroupByAggregate("hospital_num", {"hospital_num": ("count", "count")}),
)
sodium_tests = encounters_labs.run()
print(f"{len(sodium_tests)} rows extracted!")
print(sodium_tests)
2023-09-21 09:26:19,814 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:26:19,815 INFO cycquery.utils.profile - Finished executing function run_query in 506.939296 s
1 rows extracted!
hospital_num count
0 101 9305