cyclops.monitor.clinical_applicator.ClinicalShiftApplicator#

class ClinicalShiftApplicator(shift_type, source, target, shift_id=None)[source]#

Bases: object

The ClinicalShiftApplicator class is used induce synthetic clinical shifts.

Takes a dataset and generates a source and target dataset with a specified clinical shift. The shift is induced by splitting along categorical features in the dataset. The source and target datasets are then generated by splitting the original dataset along the categorical feature.

# Examples # ——– # >>> from cyclops.monitor.clinical_applicator import ClinicalShiftApplicator # >>> from cyclops.data.loader import load_nihcxr # >>> ds = load_nihcxr(path=”/mnt/data/nihcxr”) # >>> applicator = ClinicalShiftApplicator(“hospital_type”, # source = [“hospital_type_1”, “hospital_type_2”] # target = [“hospital_type_3”, “hospital_type_4”, “hospital_type_5”] # ) # >>> ds_source, ds_target = applicator.apply_shift(ds)

Parameters:

shift_type (str) – method used to induce shift in data. Options include: “time”, “month”, “hospital_type”, “custom”
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id (str) – Column name for shift id. Default is None.

Methods

`age`	Apply age shift to dataset.
`apply_shift`	Apply shift to dataset using specified shift type.
`custom`	Build custom shift.
`hospital_type`	Apply shift for selection of hospital types.
`month`	Apply shift for selection of months.
`sex`	Apply shift for sex to dataset.
`time`	Apply time shift to dataset.

age(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply age shift to dataset.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
shift_id (str) – Column name for shift id.
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

apply_shift(dataset, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift to dataset using specified shift type.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

custom(dataset, source, target, shift_id=None, batched=True, batch_size=1000, num_proc=1)[source]#

Build custom shift.

Build a custom shift by passing in a SliceSpec for source and target data.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
source (SliceSpec) – SliceSpec for source data.
target (SliceSpec) – SliceSpec for target data.
shift_id (str) – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

hospital_type(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for selection of hospital types.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
shift_id (str) – Column name for shift id.
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

month(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for selection of months.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
shift_id (str) – Column name for shift id.
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

sex(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply shift for sex to dataset.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
shift_id (str) – Column name for shift id.
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.

time(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#

Apply time shift to dataset.

Parameters:

dataset (huggingface Dataset) – Dataset to apply shift to.
shift_id (str) – Column name for shift id.
source (list) – List of values for source data.
target (list) – List of values for target data.
shift_id – Column name for shift id.
batched (bool) – Whether to use batching or not. Default is True.
batch_size (int) – Batch size. Default is 1000.
num_proc (int) – Number of processes to use. Default is 1.

Return type:

Tuple[Dataset, Dataset]

Returns:

ds_source (huggingface Dataset) – Dataset with source data.
ds_target (huggingface Dataset) – Dataset with target data.