cyclops.monitor.clinical_applicator.ClinicalShiftApplicator#

class ClinicalShiftApplicator(shift_type, source, target, shift_id=None)[source]#

Bases: object

The ClinicalShiftApplicator class is used induce synthetic clinical shifts.

Takes a dataset and generates a source and target dataset with a specified clinical shift. The shift is induced by splitting along categorical features in the dataset. The source and target datasets are then generated by splitting the original dataset along the categorical feature.

# Examples # ——– # >>> from cyclops.monitor.clinical_applicator import ClinicalShiftApplicator # >>> from cyclops.data.loader import load_nihcxr # >>> ds = load_nihcxr(path=”/mnt/data/nihcxr”) # >>> applicator = ClinicalShiftApplicator(β€œhospital_type”, # source = [β€œhospital_type_1”, β€œhospital_type_2”] # target = [β€œhospital_type_3”, β€œhospital_type_4”, β€œhospital_type_5”] # ) # >>> ds_source, ds_target = applicator.apply_shift(ds)

Parameters:
  • shift_type (str) – method used to induce shift in data. Options include: β€œtime”, β€œmonth”, β€œhospital_type”, β€œcustom”

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id (str) – Column name for shift id. Default is None.

Methods

age

Apply age shift to dataset.

apply_shift

Apply shift to dataset using specified shift type.

custom

Build custom shift.

hospital_type

Apply shift for selection of hospital types.

month

Apply shift for selection of months.

sex

Apply shift for sex to dataset.

time

Apply time shift to dataset.

age(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply age shift to dataset.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • shift_id (str) – Column name for shift id.

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

apply_shift(dataset, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift to dataset using specified shift type.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

custom(dataset, source, target, shift_id=None, batched=True, batch_size=1000, num_proc=1)[source]#

Build custom shift.

Build a custom shift by passing in a SliceSpec for source and target data.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • source (SliceSpec) – SliceSpec for source data.

  • target (SliceSpec) – SliceSpec for target data.

  • shift_id (str) – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

hospital_type(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for selection of hospital types.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • shift_id (str) – Column name for shift id.

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

month(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for selection of months.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • shift_id (str) – Column name for shift id.

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

sex(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for sex to dataset.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • shift_id (str) – Column name for shift id.

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.

time(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply time shift to dataset.

Parameters:
  • dataset (huggingface Dataset) – Dataset to apply shift to.

  • shift_id (str) – Column name for shift id.

  • source (list) – List of values for source data.

  • target (list) – List of values for target data.

  • shift_id – Column name for shift id.

  • batched (bool) – Whether to use batching or not. Default is True.

  • batch_size (int) – Batch size. Default is 1000.

  • num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

  • ds_source (huggingface Dataset) – Dataset with source data.

  • ds_target (huggingface Dataset) – Dataset with target data.