cycquery.base.DatasetQuerier

class DatasetQuerier(dbms, user='', password='', host='', port=None, database='', schemas=None)[source]

Bases: object

Base class to query EHR datasets.

db

ORM Database used to run queries.

Parameters:
  • dbms (str) – The database management system type (e.g., ‘postgresql’, ‘mysql’, ‘sqlite’).

  • user (str, optional) – The username for the database, by default empty. Not used for SQLite.

  • pwd (str, optional) – The password for the database, by default empty. Not used for SQLite.

  • host (str, optional) – The host address of the database, by default empty. Not used for SQLite.

  • port (int, optional) – The port number for the database, by default None. Not used for SQLite.

  • database (str, optional) – The name of the database or the path to the database file (for SQLite), by default empty.

  • schemas (Union[str, List[str]], optional) – The schema(s) to query, by default None.

Notes

This class is intended to be subclassed to provide methods for querying tables in the database. This class automatically creates methods for querying tables in the database. The methods are named after the schema and table name, i.e. self.schema_name.table_name(). The methods are created when the class is instantiated. The subclass can provide custom methods for querying tables in the database which can build on the methods created by this class.

Methods

get_table

Get a table and possibly map columns to have standard names.

list_columns

List columns in a table.

list_custom_tables

List custom tables methods provided by the dataset API.

list_schemas

List schemas in the database to query.

list_tables

List table methods that can be queried using the database.

get_table(schema_name, table_name, cast_timestamp_cols=True)[source]

Get a table and possibly map columns to have standard names.

Standardizing column names allows for columns to be recognized in downstream processing.

Parameters:
  • schema_name (str) – Name of schema in the database.

  • table_name (str) – Name of table in the database.

  • cast_timestamp_cols (bool) – Whether to cast timestamp columns to datetime.

Returns:

Table with mapped columns.

Return type:

sqlalchemy.sql.selectable.Subquery

list_columns(schema_name, table_name)[source]

List columns in a table.

Parameters:
  • schema_name (str) – Name of schema in the database.

  • table_name (str) – Name of table in the database.

Returns:

List of column names.

Return type:

List[str]

list_custom_tables()[source]

List custom tables methods provided by the dataset API.

Returns:

List of custom table names.

Return type:

List[str]

list_schemas()[source]

List schemas in the database to query.

Returns:

List of schema names.

Return type:

List[str]

list_tables(schema_name=None)[source]

List table methods that can be queried using the database.

Parameters:

schema_name (Optional[str]) – Name of schema in the database.

Returns:

List of table names.

Return type:

List[str]