cyclops.monitor.clinical_applicator.ClinicalShiftApplicator#
- class ClinicalShiftApplicator(shift_type, source, target, shift_id=None)[source]#
Bases:
objectThe ClinicalShiftApplicator class is used induce synthetic clinical shifts.
Takes a dataset and generates a source and target dataset with a specified clinical shift. The shift is induced by splitting along categorical features in the dataset. The source and target datasets are then generated by splitting the original dataset along the categorical feature.
# Examples # βββ # >>> from cyclops.monitor.clinical_applicator import ClinicalShiftApplicator # >>> from cyclops.data.loader import load_nihcxr # >>> ds = load_nihcxr(path=β/mnt/data/nihcxrβ) # >>> applicator = ClinicalShiftApplicator(βhospital_typeβ, # source = [βhospital_type_1β, βhospital_type_2β] # target = [βhospital_type_3β, βhospital_type_4β, βhospital_type_5β] # ) # >>> ds_source, ds_target = applicator.apply_shift(ds)
- Parameters:
Methods
Apply age shift to dataset.
Apply shift to dataset using specified shift type.
Build custom shift.
Apply shift for selection of hospital types.
Apply shift for selection of months.
Apply shift for sex to dataset.
Apply time shift to dataset.
- age(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#
Apply age shift to dataset.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
shift_id (str) β Column name for shift id.
source (list) β List of values for source data.
target (list) β List of values for target data.
shift_id β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- apply_shift(dataset, batched=True, batch_size=1000, num_proc=1)[source]#
Apply shift to dataset using specified shift type.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- custom(dataset, source, target, shift_id=None, batched=True, batch_size=1000, num_proc=1)[source]#
Build custom shift.
Build a custom shift by passing in a SliceSpec for source and target data.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
source (SliceSpec) β SliceSpec for source data.
target (SliceSpec) β SliceSpec for target data.
shift_id (str) β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- hospital_type(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#
Apply shift for selection of hospital types.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
shift_id (str) β Column name for shift id.
source (list) β List of values for source data.
target (list) β List of values for target data.
shift_id β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- month(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#
Apply shift for selection of months.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
shift_id (str) β Column name for shift id.
source (list) β List of values for source data.
target (list) β List of values for target data.
shift_id β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- sex(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#
Apply shift for sex to dataset.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
shift_id (str) β Column name for shift id.
source (list) β List of values for source data.
target (list) β List of values for target data.
shift_id β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.
- time(dataset, source, target, shift_id, batched=True, batch_size=1000, num_proc=1)[source]#
Apply time shift to dataset.
- Parameters:
dataset (huggingface Dataset) β Dataset to apply shift to.
shift_id (str) β Column name for shift id.
source (list) β List of values for source data.
target (list) β List of values for target data.
shift_id β Column name for shift id.
batched (bool) β Whether to use batching or not. Default is True.
batch_size (int) β Batch size. Default is 1000.
num_proc (int) β Number of processes to use. Default is 1.
- Return type:
Tuple[Dataset,Dataset]- Returns:
ds_source (huggingface Dataset) β Dataset with source data.
ds_target (huggingface Dataset) β Dataset with target data.