GEMINI tutorial

This notebook shows examples of how to use the cycquery tool on GEMINI.

Imports and instantiate GEMINIQuerier.

[1]:
"""GEMINI tutorial."""

import cycquery.ops as qo
from cycquery import GEMINIQuerier


querier = GEMINIQuerier(
    host="db.gemini-hpc.ca",
    database="delirium_v4_0_1",
    user="username",
    password="password",
)
# List all tables.
querier.list_tables("public")
2023-09-21 09:17:45,449 INFO cycquery.orm - Database setup, ready to run queries!
[1]:
['public.lookup_icd10_ca_description',
 'public.lookup_statcan',
 'public.lookup_cci',
 'public.lookup_icd10_ca_to_ccsr',
 'public.lookup_ip_administrative',
 'public.lookup_lab_concept',
 'public.lookup_vitals_concept',
 'public.lookup_pharmacy_concept',
 'public.lookup_diagnosis',
 'public.locality_variables',
 'public.admdad',
 'public.derived_variables',
 'public.ipscu',
 'public.lookup_phy_characteristics',
 'public.lab',
 'public.ipintervention',
 'public.lookup_ccsr',
 'public.lookup_pharmacy_route',
 'public.lookup_transfusion_concept',
 'public.lookup_ip_scu',
 'public.lookup_er_administrative',
 'public.lookup_imaging',
 'public.pharmacy',
 'public.radiology',
 'public.lookup_transfer',
 'public.ipdiagnosis',
 'public.lookup_room_transfer',
 'public.er',
 'public.erdiagnosis',
 'public.erintervention',
 'public.roomtransfer',
 'public.transfusion',
 'public.vitals',
 'public.lookup_hospital_num']

Example 1a. Create a table with only one hospitalization per patient, keeping the most recent encounter for each patient. Sort the dataset by patient_id_hashed and discharge_date_time, and then keep the recent record.

[2]:
ops = qo.Sequential(
    qo.OrderBy(
        ["patient_id_hashed", "discharge_date_time"],
        ascending=[True, False],
    ),
    qo.Distinct("patient_id_hashed"),
)
encounters = querier.public.admdad()
encounters = encounters.ops(ops).run()
print(f"{len(encounters)} rows extracted!")
2023-09-21 09:17:51,902 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:51,903 INFO cycquery.utils.profile - Finished executing function run_query in 6.093352 s
189734 rows extracted!

Example 1b. From the above set of encounters, take a subset of patients who were discharged between April 1, 2015 and March 31, 2016.

[3]:
ops = qo.Sequential(
    qo.Cast("discharge_date_time", "timestamp"),
    qo.ConditionAfterDate("discharge_date_time", "2015-04-01"),
    qo.ConditionBeforeDate("discharge_date_time", "2016-03-31"),
)
encounters_query = querier.public.admdad()
encounters_query = encounters_query.ops(ops)
encounters = encounters_query.run()
print(f"{len(encounters)} rows extracted!")
2023-09-21 09:17:52,591 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:52,592 INFO cycquery.utils.profile - Finished executing function run_query in 0.675141 s
32567 rows extracted!

Example 1c. From the above set of encounters, get the total number of admissions for each hospital.

[4]:
ops = qo.GroupByAggregate("hospital_num", {"hospital_num": ("count", "count")})
encounters_per_site = encounters_query.ops(ops).run()
print(f"{len(encounters_per_site)} rows extracted!")
2023-09-21 09:17:52,856 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:17:52,857 INFO cycquery.utils.profile - Finished executing function run_query in 0.145693 s
7 rows extracted!

Example 2a. How many sodium tests were placed between Apr 1, 2015 and May 31, 2015 at hospital 101?

[5]:
encounters = querier.public.admdad()
encounters = encounters.ops(qo.ConditionEquals("hospital_num", 101))
lab_ops = qo.Sequential(
    qo.DropEmpty("collection_date_time"),
    qo.Cast("collection_date_time", "timestamp"),
    qo.ConditionAfterDate("collection_date_time", "2015-04-01"),
    qo.ConditionBeforeDate("collection_date_time", "2015-05-31"),
    qo.ConditionSubstring("test_type_mapped", "sodium"),
)
labs = querier.public.lab()
labs = labs.ops(lab_ops)
encounters_labs = encounters.join(labs, on="genc_id")
encounters_labs = encounters_labs.ops(
    qo.GroupByAggregate("hospital_num", {"hospital_num": ("count", "count")}),
)
sodium_tests = encounters_labs.run()
print(f"{len(sodium_tests)} rows extracted!")
print(sodium_tests)
2023-09-21 09:26:19,814 INFO cycquery.orm - Query returned successfully!
2023-09-21 09:26:19,815 INFO cycquery.utils.profile - Finished executing function run_query in 506.939296 s
1 rows extracted!
   hospital_num  count
0           101   9305